Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
fabricebrito committed Nov 1, 2023
1 parent 5849265 commit ebfc2f2
Show file tree
Hide file tree
Showing 5 changed files with 332 additions and 13 deletions.
73 changes: 72 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,73 @@
# BiDS23
# Mastering Earth Observation Application Packaging with CWL

This guide supports the BiDS23 "Mastering Earth Observation Application Packaging with CWL" tutorial event, where we will dive into the world of EO Application Packages and explore how to effectively package, share, and execute Earth observation workflows using the Common Workflow Language (CWL) standard.

This tutorial event is designed for developers, scientists, and Earth observation enthusiasts who want to enhance their skills in creating and sharing EO Application Packages.

Whether you are new to CWL or already have some experience, this event will provide valuable insights and practical knowledge to boost your expertise.

During the event, you will learn:

* The fundamentals of EO Application Packages and their role in the Earth observation domain.
* How to leverage CWL to describe, package, and share workflows.
* Techniques for incorporating data, code, configuration files, and documentation into an EO Application Package.
* Best practices for creating portable and reproducible Earth observation workflows.
* Hands-on exercises to reinforce your understanding and gain practical experience.

This tutorial will guide you through step-by-step tutorials, demonstrating the process of creating EO Application Packages using CWL.

The tutorial is structured in several parts:

``` mermaid
%%{init: { 'logLevel': 'debug', 'theme': 'forest' } }%%
timeline
title Mastering EO Application Packaging with CWL
Part 1 - Water bodies detection
: Short introduction
: Application steps
Part 2 - Execution in Python environments
: Create Python environment
: Run Python script
Part 3 - Package the Application 1/2
: Create and test the containers
: Create the CWL CommandLineTool
: Run the CWL CommandLineTool with podman
Part 4 - Package the Application 2/2
: CWL Workflow for Sentinel-2 Cloud Native processing
: CWL Workflow for Landsat-9 processing (includes stage-in/out)
: CWL Workflow of workflows
Part 5 - Release the Application
: Continuous Integration
: Containers published in a container registry
: Application Packages in a package registry
Part 6 - Execution Scenarios
: Execution using a CWL runner (cwltool)
: Execution on kubernetes using the calrissian CWL runner
Part 7 - FAIR Application Packages
: Recommendations and Best Practices
```

The tooling used during each part is listed below:

``` mermaid
%%{init: { 'logLevel': 'debug', 'theme': 'forest' } }%%
timeline
title Tooling
Part 1 - Water bodies detection
: N/A
Part 2 - Execution in Python environments
: python venv
: python
Part 3 - Package the Application 1/2
: podman
: cwltool
Part 4 - Package the Application 2/2
: cwltool
Part 5 - Release the Application
: N/A
Part 6 - Execution Scenarios
: cwltool
: calrissian
Part 7 - FAIR Application Packages
: N/A
```
70 changes: 70 additions & 0 deletions docs/new-to/cwl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@

The paper [_Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language_](https://arxiv.org/abs/2105.07028) provides an excellent description of the Common Workflow Language project producing free and open standards for describing command-line
tool based workflows.

## TL;DR

Although the paper provides a clear and concise description of the CWL standards, here's a light summary wrapping up the main points to provide the required concepts behind this guide.

### CWL Key Insights

1. CWL is a set of standards for describing and sharing computational workflows.

2. The CWL standards are used daily in many science and engineering domains, including by multi-stakeholder teams.

3. The CWL standards use a declarative syntax, facilitating polylingual workflow tasks. By being explicit about the run-time environment and any use of software containers, the CWL standards enable portability and reuse.

4. The CWL standards provide a separation of concerns between workflow authors and workflow platforms.

5. The CWL standards support critical workflow concepts like automation, scalability, abstraction, provenance, portability, and reusability.

6. The CWL standards are developed around core principles of community and shared decision-making, re-use, and zero cost for participants.

7. The CWL standards are provided as freely available open standards, supported by a diverse community in collaboration with industry, and is a Free/Open Source Software ecosystem

### CWL Features

The CWL standard support polylingual and multi-party workflows and includes two main components:

1. A standard for describing command line tools

2. A standard for describing workflows that compose such tool descriptions

The CWL standards define an explicit language with a textual syntax derived from YAML

#### CWL Command Line Tool Description Standard

The CWL Command Line Tool Description Standard describes:

- how a particular command line tool works: what are the
inputs and parameters and their types
- how to add the correct flags and switches to the command line invocation
- where to find the output files

#### CWL Workflow Description Standard

The CWL Workflow Description Standard is based on the same textual syntax derived from YAML to explicit workflow level inputs, outputs and steps.

Steps are comprised of CWL CommandLineTools or CWL sub-workflows, each re-exposing their tool’s required inputs.

Inputs for each step are connected by referencing the name of either the common workflow inputs or particular outputs of other steps.

The workflow outputs expose selected outputs from workflow steps.

Being CWL a set of standards, the workflows are executed using a CWL _runner_ and there are several implementations of such runners.

This guide uses the CWL runner [cwltool](https://pypi.org/project/cwltool).

### Recomendations

- Include documentation and labels for all components to enable the automatic generation of helpful visual depictions for any given CWL description

- Include metadata about the tool

- Include a _Workflow_ class for all CommandLineTools (a single step Workflow)

- Organize your CWL files is several individual files to ease their readability and maintenance. Pack your multi-file CWL Workflows (`cwltool --pack`) when needed

## References

- Crusoe, M. R. et al. _Methods Included: Standardizing Computational Reuse and Portability with the Common Workflow Language_, retrieved from https://arxiv.org/abs/2105.07028
175 changes: 175 additions & 0 deletions docs/new-to/yaml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
## Key-Value Pairs

Fundamentally, a file written in YAML consists of a set of _key-value pairs_.
Each pair is written as `key: value`,
where whitespace after the `:` is optional.
Key names in CWL files should not contain whitespace -
We use [_camelCase_][camelCase] for multi-word key names
that have special meaning in the CWL specification
and underscored key names otherwise.
For example:

```yaml
first_name: Bilbo
last_name: Baggins
age_years: 111
home: Bag End, Hobbiton
```
The YAML above defines four keys -
`first_name`, `last_name`, `age_years`, and `home` -
with their four respective values.
Values can be
character strings,
numeric (integer, floating point, or scientfic representation),
Boolean (`true` or `false`),
or more complex nested types (see below).

Values may be wrapped in quotation marks
but be aware that this may change the way that they are interpreted
i.e. `"1234"` will be treated as a character string
, while `1234` will be treated as an integer.
This distinction can be important,
for example when describing parameters to a command:
in CWL all parts of `baseCommand` must be strings so,
if you want to specify a fixed numeric value to a command,
make sure that you wrap that numeric value in quotes: `baseCommand: [echo, "42"]`.

## Comments

You may use `#` to add comments to your CWL and parameter files.
Any characters to the right of ` #` will be ignored by the program interpreting
the YAML.
For example:

```yaml
first_name: Bilbo
last_name: Baggins
age_years: 111
# this line will be ignored by the interpreter
home: Bag End, Hobbiton # this is ignored too
```

If there is anything on the line before the comment,
be sure to add at least one space before the `#`!

## Maps

When describing a tool or workflow with CWL,
it is usually necessary to construct more complex, nested representations.
Called _maps_,
these hierarchical structures are described in YAML by providing
additional key-value pairs as the value of any key.
These pairs (sometimes referred to as "children") are written
on new lines under the key to which they belong (the "parent"),
and should be indented with two spaces
(⇥tab characters are not allowed).
For example:

```yaml
cwlVersion: v1.0
class: CommandLineTool
baseCommand: echo
inputs: # this key has an object value
example_flag: # so does this one
type: boolean
inputBinding: # and this one too
position: 1
prefix: -f
```

The YAML above illustrates how you can build up complex nested object
descriptions relatively quickly.
The `inputs` map contains a single key, `example_flag`,
which itself contains two keys, `type` and `inputBinding`,
while one of these children, `inputBinding`,
contains a further two key-value pairs (`position` and `prefix`).
See the [Arrays](#arrays) section below for more information about providing multiple
values/key-value pairs for a single key.
For comparison with the example YAML above,
here is a graphical representation of the `inputs` object it describes.

<div class="mermaid">
graph TD
inputs --> example_flag
example_flag --> type
type --- bool((boolean))
example_flag --> inputBinding
inputBinding --> position
inputBinding --> prefix
position --- posval((1))
prefix --- prefval(('-f'))
</div>

## Arrays

In certain circumstances it is necessary to provide
multiple values or objects for a single key.
As we've already seen in the [Maps](#Maps) section above,
more than one key-value pair can be mapped to a single key.
However, it is also possible to define multiple values for a key
without having to provide a unique key for each value.
We can achieve this with an _array_,
where each value is defined on its own line and preceded by `-`.
For example:

```yaml
touchfiles:
- foo.txt
- bar.dat
- baz.txt
```

and a more complex example combining maps and arrays:

```yaml
exclusive_parameters:
type:
- type: record
name: itemC
fields:
itemC:
type: string
inputBinding:
prefix: -C
- type: record
name: itemD
fields:
itemD:
type: string
inputBinding:
prefix: -D
```

## JSON Style

YAML is based on [JavaScript Object Notation (JSON)][json]
and maps and arrays can also be defined in YAML using the native JSON syntax.
For example:

```yaml
touchfiles: [foo.txt, bar.dat, baz.txt] # equivalent to first Arrays example
```

and:

```yaml
# equivalent to the `inputs` example in "Maps" above
inputs: {example_flag: {type: boolean, inputBinding: {position: 1, prefix: -f}}}
```
Native JSON can be useful
to indicate where a field is being left intentionally empty
(such as `[]` for an empty array),
and where it makes more sense
for the values to be located on the same line
(such as when providing option flags and their values in a shell command).
However, as the second example above shows,
it can severely affect the readability of a YAML file
and should be used sparingly.



## Reference

This page is the same as http://www.commonwl.org/user_guide/yaml/
2 changes: 1 addition & 1 deletion docs/release/ci.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ This is depicted below:

``` mermaid
graph TB
SCM[(software registry)]
SCM[(software repository)]
SCM -- CWL Workflow --> A
SCM -- codemeta.json --> B
A(validate CWL Workflow) --> B(extract version)
Expand Down
25 changes: 14 additions & 11 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ theme:
plugins:
- search
- mermaid2:
version: 10.0.2
version: 10.6.0

markdown_extensions:
- pymdownx.details
Expand All @@ -44,16 +44,16 @@ markdown_extensions:
line_spans: __span
extra_css:
- styles/css/app.css
- https://unpkg.com/[email protected]/dist/mermaid.css
# - https://unpkg.com/[email protected]/dist/mermaid.css

extra_javascript:
- javascripts/config.js
- https://unpkg.com/[email protected]/dist/mermaid.min.js
# - https://unpkg.com/[email protected]/dist/mermaid.min.js
- https://polyfill.io/v3/polyfill.min.js?features=es6
- https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js

nav:
# - Home: 'index.md'
- Home: 'index.md'
- Earth Observation Application Package:
- OGC Context: app-package/ogc-context.md
- Introducing the application:
Expand Down Expand Up @@ -91,22 +91,25 @@ nav:
- Staged Landsat-9 Workflow:
- Understanding stage-in/out: cwl-workflow/stage-in-out.md
- CWL CommandLineTool for stage-in: cwl-workflow/stage-in.md
- Processing a staged Landsat-9 acquisition: cwl-workflow/staged.md
- Sentinel-2 Workflow of Workflow: cwl-workflow/scatter-cloud-native.md
- TODO CWL CommandLineTool for stage-out: cwl-workflow/stage-out.md
- TODO Processing a staged Landsat-9 acquisition: cwl-workflow/staged.md
- TODO Sentinel-2 Workflow of Workflow: cwl-workflow/scatter-cloud-native.md
- Release the EO Application:
- Scope: release/scope.md
- TODO Scope: release/scope.md
- Continuous Integration: release/ci.md
- Execution Scenarios:
- Running on a local machine:
- Sentinel-2: cwl-workflow/exec-cloud-native.md
- Landsat-9: cwl-workflow/exec-stage-in.md
- TODO Sentinel-2: cwl-workflow/exec-cloud-native.md
- TODO Landsat-9: cwl-workflow/exec-stage-in.md
- Running CWL Workflow on Kubernetes:
- Run the CWL Workflow with calrissian: kubernetes/calrissian.md
- TODO Run the CWL Workflow with calrissian: kubernetes/calrissian.md
- FAIR Application Packages: fair/best-practice.md
- Reference:
- OGC Application Package Best Practice: reference/ogc-ap-bp.md
- CWL CommandLineTool: reference/cwl-commandlinetool.md
- CWL Workflow: reference/cwl-workflow.md
- Tools:
- cwltool: reference/cwltool.md
- calrissian: reference/calrissian.md
- calrissian: reference/calrissian.md
- New to YAML: new-to/yaml.md
- New to CWL: new-to/cwl.md

0 comments on commit ebfc2f2

Please sign in to comment.