diff --git a/docs/index.md b/docs/index.md index 3521d88..878b8d0 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,2 +1,73 @@ -# BiDS23 +# Mastering Earth Observation Application Packaging with CWL +This guide supports the BiDS23 "Mastering Earth Observation Application Packaging with CWL" tutorial event, where we will dive into the world of EO Application Packages and explore how to effectively package, share, and execute Earth observation workflows using the Common Workflow Language (CWL) standard. + +This tutorial event is designed for developers, scientists, and Earth observation enthusiasts who want to enhance their skills in creating and sharing EO Application Packages. + +Whether you are new to CWL or already have some experience, this event will provide valuable insights and practical knowledge to boost your expertise. + +During the event, you will learn: + +* The fundamentals of EO Application Packages and their role in the Earth observation domain. +* How to leverage CWL to describe, package, and share workflows. +* Techniques for incorporating data, code, configuration files, and documentation into an EO Application Package. +* Best practices for creating portable and reproducible Earth observation workflows. +* Hands-on exercises to reinforce your understanding and gain practical experience. + +This tutorial will guide you through step-by-step tutorials, demonstrating the process of creating EO Application Packages using CWL. + +The tutorial is structured in several parts: + +``` mermaid +%%{init: { 'logLevel': 'debug', 'theme': 'forest' } }%% +timeline +title Mastering EO Application Packaging with CWL +Part 1 - Water bodies detection + : Short introduction + : Application steps +Part 2 - Execution in Python environments + : Create Python environment + : Run Python script +Part 3 - Package the Application 1/2 + : Create and test the containers + : Create the CWL CommandLineTool + : Run the CWL CommandLineTool with podman +Part 4 - Package the Application 2/2 + : CWL Workflow for Sentinel-2 Cloud Native processing + : CWL Workflow for Landsat-9 processing (includes stage-in/out) + : CWL Workflow of workflows +Part 5 - Release the Application + : Continuous Integration + : Containers published in a container registry + : Application Packages in a package registry +Part 6 - Execution Scenarios + : Execution using a CWL runner (cwltool) + : Execution on kubernetes using the calrissian CWL runner +Part 7 - FAIR Application Packages + : Recommendations and Best Practices +``` + +The tooling used during each part is listed below: + +``` mermaid +%%{init: { 'logLevel': 'debug', 'theme': 'forest' } }%% +timeline +title Tooling +Part 1 - Water bodies detection + : N/A +Part 2 - Execution in Python environments + : python venv + : python +Part 3 - Package the Application 1/2 + : podman + : cwltool +Part 4 - Package the Application 2/2 + : cwltool +Part 5 - Release the Application + : N/A +Part 6 - Execution Scenarios + : cwltool + : calrissian +Part 7 - FAIR Application Packages + : N/A +``` \ No newline at end of file diff --git a/docs/new-to/cwl.md b/docs/new-to/cwl.md new file mode 100644 index 0000000..ba28db9 --- /dev/null +++ b/docs/new-to/cwl.md @@ -0,0 +1,70 @@ + +The paper [_Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language_](https://arxiv.org/abs/2105.07028) provides an excellent description of the Common Workflow Language project producing free and open standards for describing command-line +tool based workflows. + +## TL;DR + +Although the paper provides a clear and concise description of the CWL standards, here's a light summary wrapping up the main points to provide the required concepts behind this guide. + +### CWL Key Insights + +1. CWL is a set of standards for describing and sharing computational workflows. + +2. The CWL standards are used daily in many science and engineering domains, including by multi-stakeholder teams. + +3. The CWL standards use a declarative syntax, facilitating polylingual workflow tasks. By being explicit about the run-time environment and any use of software containers, the CWL standards enable portability and reuse. + +4. The CWL standards provide a separation of concerns between workflow authors and workflow platforms. + +5. The CWL standards support critical workflow concepts like automation, scalability, abstraction, provenance, portability, and reusability. + +6. The CWL standards are developed around core principles of community and shared decision-making, re-use, and zero cost for participants. + +7. The CWL standards are provided as freely available open standards, supported by a diverse community in collaboration with industry, and is a Free/Open Source Software ecosystem + +### CWL Features + +The CWL standard support polylingual and multi-party workflows and includes two main components: + +1. A standard for describing command line tools + +2. A standard for describing workflows that compose such tool descriptions + +The CWL standards define an explicit language with a textual syntax derived from YAML + +#### CWL Command Line Tool Description Standard + +The CWL Command Line Tool Description Standard describes: + +- how a particular command line tool works: what are the +inputs and parameters and their types +- how to add the correct flags and switches to the command line invocation +- where to find the output files + +#### CWL Workflow Description Standard + +The CWL Workflow Description Standard is based on the same textual syntax derived from YAML to explicit workflow level inputs, outputs and steps. + +Steps are comprised of CWL CommandLineTools or CWL sub-workflows, each re-exposing their tool’s required inputs. + +Inputs for each step are connected by referencing the name of either the common workflow inputs or particular outputs of other steps. + +The workflow outputs expose selected outputs from workflow steps. + +Being CWL a set of standards, the workflows are executed using a CWL _runner_ and there are several implementations of such runners. + +This guide uses the CWL runner [cwltool](https://pypi.org/project/cwltool). + +### Recomendations + +- Include documentation and labels for all components to enable the automatic generation of helpful visual depictions for any given CWL description + +- Include metadata about the tool + +- Include a _Workflow_ class for all CommandLineTools (a single step Workflow) + +- Organize your CWL files is several individual files to ease their readability and maintenance. Pack your multi-file CWL Workflows (`cwltool --pack`) when needed + +## References + +- Crusoe, M. R. et al. _Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language_, retrieved from https://arxiv.org/abs/2105.07028 \ No newline at end of file diff --git a/docs/new-to/yaml.md b/docs/new-to/yaml.md new file mode 100644 index 0000000..7e08ccc --- /dev/null +++ b/docs/new-to/yaml.md @@ -0,0 +1,175 @@ +## Key-Value Pairs + +Fundamentally, a file written in YAML consists of a set of _key-value pairs_. +Each pair is written as `key: value`, +where whitespace after the `:` is optional. +Key names in CWL files should not contain whitespace - +We use [_camelCase_][camelCase] for multi-word key names +that have special meaning in the CWL specification +and underscored key names otherwise. +For example: + +```yaml +first_name: Bilbo +last_name: Baggins +age_years: 111 +home: Bag End, Hobbiton +``` + +The YAML above defines four keys - +`first_name`, `last_name`, `age_years`, and `home` - +with their four respective values. +Values can be +character strings, +numeric (integer, floating point, or scientfic representation), +Boolean (`true` or `false`), +or more complex nested types (see below). + +Values may be wrapped in quotation marks +but be aware that this may change the way that they are interpreted +i.e. `"1234"` will be treated as a character string +, while `1234` will be treated as an integer. +This distinction can be important, +for example when describing parameters to a command: +in CWL all parts of `baseCommand` must be strings so, +if you want to specify a fixed numeric value to a command, +make sure that you wrap that numeric value in quotes: `baseCommand: [echo, "42"]`. + +## Comments + +You may use `#` to add comments to your CWL and parameter files. +Any characters to the right of ` #` will be ignored by the program interpreting +the YAML. +For example: + +```yaml +first_name: Bilbo +last_name: Baggins +age_years: 111 +# this line will be ignored by the interpreter +home: Bag End, Hobbiton # this is ignored too +``` + +If there is anything on the line before the comment, +be sure to add at least one space before the `#`! + +## Maps + +When describing a tool or workflow with CWL, +it is usually necessary to construct more complex, nested representations. +Called _maps_, +these hierarchical structures are described in YAML by providing +additional key-value pairs as the value of any key. +These pairs (sometimes referred to as "children") are written +on new lines under the key to which they belong (the "parent"), +and should be indented with two spaces +(⇥tab characters are not allowed). +For example: + +```yaml +cwlVersion: v1.0 +class: CommandLineTool +baseCommand: echo +inputs: # this key has an object value + example_flag: # so does this one + type: boolean + inputBinding: # and this one too + position: 1 + prefix: -f +``` + +The YAML above illustrates how you can build up complex nested object +descriptions relatively quickly. +The `inputs` map contains a single key, `example_flag`, +which itself contains two keys, `type` and `inputBinding`, +while one of these children, `inputBinding`, +contains a further two key-value pairs (`position` and `prefix`). +See the [Arrays](#arrays) section below for more information about providing multiple +values/key-value pairs for a single key. +For comparison with the example YAML above, +here is a graphical representation of the `inputs` object it describes. + +
+graph TD + inputs --> example_flag + example_flag --> type + type --- bool((boolean)) + example_flag --> inputBinding + inputBinding --> position + inputBinding --> prefix + position --- posval((1)) + prefix --- prefval(('-f')) +
+ +## Arrays + +In certain circumstances it is necessary to provide +multiple values or objects for a single key. +As we've already seen in the [Maps](#Maps) section above, +more than one key-value pair can be mapped to a single key. +However, it is also possible to define multiple values for a key +without having to provide a unique key for each value. +We can achieve this with an _array_, +where each value is defined on its own line and preceded by `-`. +For example: + +```yaml +touchfiles: + - foo.txt + - bar.dat + - baz.txt +``` + +and a more complex example combining maps and arrays: + +```yaml +exclusive_parameters: + type: + - type: record + name: itemC + fields: + itemC: + type: string + inputBinding: + prefix: -C + - type: record + name: itemD + fields: + itemD: + type: string + inputBinding: + prefix: -D +``` + +## JSON Style + +YAML is based on [JavaScript Object Notation (JSON)][json] +and maps and arrays can also be defined in YAML using the native JSON syntax. +For example: + +```yaml +touchfiles: [foo.txt, bar.dat, baz.txt] # equivalent to first Arrays example +``` + +and: + +```yaml +# equivalent to the `inputs` example in "Maps" above +inputs: {example_flag: {type: boolean, inputBinding: {position: 1, prefix: -f}}} +``` + +Native JSON can be useful +to indicate where a field is being left intentionally empty +(such as `[]` for an empty array), +and where it makes more sense +for the values to be located on the same line +(such as when providing option flags and their values in a shell command). +However, as the second example above shows, +it can severely affect the readability of a YAML file +and should be used sparingly. + + + +## Reference + +This page is the same as http://www.commonwl.org/user_guide/yaml/ \ No newline at end of file diff --git a/docs/release/ci.md b/docs/release/ci.md index 9314cd6..2db54ee 100644 --- a/docs/release/ci.md +++ b/docs/release/ci.md @@ -30,7 +30,7 @@ This is depicted below: ``` mermaid graph TB -SCM[(software registry)] +SCM[(software repository)] SCM -- CWL Workflow --> A SCM -- codemeta.json --> B A(validate CWL Workflow) --> B(extract version) diff --git a/mkdocs.yml b/mkdocs.yml index f517a9f..4f09fde 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -18,7 +18,7 @@ theme: plugins: - search - mermaid2: - version: 10.0.2 + version: 10.6.0 markdown_extensions: - pymdownx.details @@ -44,16 +44,16 @@ markdown_extensions: line_spans: __span extra_css: - styles/css/app.css - - https://unpkg.com/mermaid@8.9.1/dist/mermaid.css +# - https://unpkg.com/mermaid@8.9.1/dist/mermaid.css extra_javascript: - javascripts/config.js - - https://unpkg.com/mermaid@8.9.1/dist/mermaid.min.js +# - https://unpkg.com/mermaid@8.9.1/dist/mermaid.min.js - https://polyfill.io/v3/polyfill.min.js?features=es6 - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js nav: - # - Home: 'index.md' + - Home: 'index.md' - Earth Observation Application Package: - OGC Context: app-package/ogc-context.md - Introducing the application: @@ -91,17 +91,18 @@ nav: - Staged Landsat-9 Workflow: - Understanding stage-in/out: cwl-workflow/stage-in-out.md - CWL CommandLineTool for stage-in: cwl-workflow/stage-in.md - - Processing a staged Landsat-9 acquisition: cwl-workflow/staged.md - - Sentinel-2 Workflow of Workflow: cwl-workflow/scatter-cloud-native.md + - TODO CWL CommandLineTool for stage-out: cwl-workflow/stage-out.md + - TODO Processing a staged Landsat-9 acquisition: cwl-workflow/staged.md + - TODO Sentinel-2 Workflow of Workflow: cwl-workflow/scatter-cloud-native.md - Release the EO Application: - - Scope: release/scope.md + - TODO Scope: release/scope.md - Continuous Integration: release/ci.md - Execution Scenarios: - Running on a local machine: - - Sentinel-2: cwl-workflow/exec-cloud-native.md - - Landsat-9: cwl-workflow/exec-stage-in.md + - TODO Sentinel-2: cwl-workflow/exec-cloud-native.md + - TODO Landsat-9: cwl-workflow/exec-stage-in.md - Running CWL Workflow on Kubernetes: - - Run the CWL Workflow with calrissian: kubernetes/calrissian.md + - TODO Run the CWL Workflow with calrissian: kubernetes/calrissian.md - FAIR Application Packages: fair/best-practice.md - Reference: - OGC Application Package Best Practice: reference/ogc-ap-bp.md @@ -109,4 +110,6 @@ nav: - CWL Workflow: reference/cwl-workflow.md - Tools: - cwltool: reference/cwltool.md - - calrissian: reference/calrissian.md \ No newline at end of file + - calrissian: reference/calrissian.md + - New to YAML: new-to/yaml.md + - New to CWL: new-to/cwl.md \ No newline at end of file