Skip to content

Latest commit

 

History

History
423 lines (336 loc) · 14.4 KB

0082-workspace-hinting.md

File metadata and controls

423 lines (336 loc) · 14.4 KB
status title creation-date last-updated authors
proposed
Workspace Hinting
2021-09-03
2021-10-26
@sbwsg

TEP-0082: Workspace Hinting

Summary

Workspaces allow Task authors to declare portions of their Task's filesystem to be supplied at runtime by TaskRuns or PipelineRuns. For example a Task may accept a credential via an optional Workspace and a TaskRun might supply it from a Secret. Another Task might write source code to a Workspace and a PipelineRun could bind a Persistent Volume to it so the source can be passed to other PipelineTasks.

Rephrasing this slightly: the interface that Workspaces expose caters to a number of pretty disjoint use-cases - it's general-purpose. A down-side of that is Task authors can't communicate a Workspace's intended usage in a machine-readable way. There's no way for an author to indicate "this Workspace is intended to accept a credential" or "this Workspace should be supplied with configuration". Similarly for Pipeline authors, there's no way to hint that a Workspace is used to shuttle data around between Tasks. They can write a human-readable description as part of the workspace declaration but that's essentially useless to an automated system constructing TaskRuns and PipelineRuns.

The purpose of this TEP is to allow Task and Pipeline authors to "hint" about the intended purpose of a Workspace. The idea is that if authors can mark Workspaces with a purpose then automated systems could be designed to submit reasonable default bindings for them.

Motivation

Goals

  • Provide a way for Tasks and Pipelines to declare the purpose of Workspaces in a machine-readable format.

Non-Goals

  • Adding constraint-checking or any other logic to Pipelines to validate bound Workspaces based on workspace hints. The potential scope related to a feature like this would be subtly massive. This TEP is trying to hold focus on the "external system" / "machine-readable" use-case. In future we may want to build higher level abstractions related to this proposal which could leverage hinting.

Use Cases (optional)

The Tekton Workflows project is currently exploring ways to pass Secrets from a high-level Workflow description into a PipelineRun. This is made considerably more difficult because Pipelines can't indicate which of their Workspaces might be the right one to bind those Secrets to. See the Aug 31, 2021 Workflows WG Meeting Notes.

Requirements

  • Hinting must be optional: we don't want to suddenly invalidate every Task or Pipeline that currently includes a Workspace.

Proposal

Notes/Caveats (optional)

Risks and Mitigations

User Experience (optional)

Performance (optional)

Design Details

Test Plan

Design Evaluation

Drawbacks

Alternatives

At this stage in the proposal we're just capturing some options to consider. As we move to implementable we'll settle this design on one of them and flesh it out more fully.

Embed Default Workspace Bindings in Tasks/Pipelines

cf. Pipelines#4083

Allow Task and Pipeline authors to explicitly declare some or all of a default Workspace Binding which PipelineRuns and TaskRuns can use or override:

kind: Task
spec:
  workspaces:
  - name: docker-json
    mountPath: /wherever/docker/json/goes
    default:
      secret:
        secretName: my-docker-json

A TaskRun referencing this Task could either provide a docker-json Workspace or omit it. If omitted the Task's default would be used.

A Pipeline could take a similar approach with a volumeClaimTemplate:

kind: Pipeline
spec:
  workspaces:
  - name: shared-data
    default:
      volumeClaimTemplate:
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 256M

Pros:

  • Very explicit about the Task's expectations for the content provided.
  • Pipelines can override a Tasks' expected types - so, for example, a Task might expect a ConfigMap but a Pipeline might override with a PV instead.
  • Offers its own benefits beyond hinting, such as being able to offer a Pipeline that "just works" out of the box without any tricky workspace configuration in the PipelineRun.
  • Doesn't preclude adding more explicit hinting later.

Cons:

  • Not just an API change - there will be some logic involved here on the Pipelines controller side to apply the default workspace config to runs.
  • Questions around future extensions stemming from this change are quite nuanced:
    • What if an author wants the "default" to actually be a requirement and for the TaskRun to fail if it's missing? For example a deploy Task requiring a Secret with name "cluster-key" to exist in the TaskRun's namespace.
    • What if an author wants to use a default ConfigMap only if that default exists in the TaskRun's namespace but otherwise a fallback like an emptyDir?
    • What if a Catalog Pipeline author attaches a PersistentVolume type or StorageClass that is only available on a subset of cloud providers?

Add a Hinting Field to Workspace Declarations

The precise name of this field can be iterated on but for now let's assume "profile".

A Workspace Declaration in a Task or Pipeline can include a profile field that is a string matching a fixed set of available options:

  • "cache" to hint that the workspace will be used as a cache for performance or reproducibility (e.g. a system might require all teams to use a shared node_modules directory when compiling their frontends).
  • "config" to hint that the workspace is intended to supply some configuration or settings.
  • "credential" to hint that the workspace will be used to perform authenticated actions.
  • "data" to hint that the contents are arbitrary data either consumed or produced by the Task.

A third-party system can attach its own meaning to each of these profiles. A "credential" Workspace could be populated from a Secret or Secret-like volume. A "cache" Workspace might be supplied with a long-lived read-only Persistent Volume. A "data" Workspace might be assumed to require an ephemeral Persistent Volume that lives only as long as the PipelineRun. "configuration" Workspaces could map consistently to ConfigMaps. Importantly: these decisions are left up to the external / platform. Our own Workflows project may be able to utilize these profiles, for example, to make informed choices when creating a PipelineRun based solely on Pipeline YAML, supplied list of volumes and set of Secret references.

Here's an example from a git-clone-like Task that accepts an optional GitHub deploy key:

workspaces:
- name: deploy-key
  readOnly: true
  optional: true
  profile: credential
- name: output
  profile: data

Cons

  • It's a bit unclear what the incentive for including profiles would be for Catalog Task authors. How would they "figure out" the purpose and correct values to put in here?

Hash-Tags in a Workspace's Description Field

This approach would be entirely ad-hoc: Task authors could include hash tags in their Workspaces' description fields. A platform could scan for them and act accordingly. User Interfaces like Hub could be programmed to ignore them or surface them in their own visual component. Here's what Workspaces for a go-build Task might look like with these:

workspaces:
- name: source-code
  readOnly: true
  description: "The source of a go program. #data"
- name: output
  description: "Compiled binaries will be written here. #data"

Pros

  • Free-form.
  • The set of recognized hash-tags could be specified and validated by Pipelines (#cache, #config, #credential, #data).

Cons

  • Syntactically different from the "profiles" alternative but otherwise not functionally all that different.
  • Sets a precedent for expanding the description of a workspace to include other metadata.

Loosely-Coupled Metadata

Use an external JSON file or annotations on the Task to describe the extra meaning being given to workspaces.

Pros

  • Non-API change.

Syntactic Alternatives to workspaces

New fields that allow volumes to be bound with different defaults. For example, a credentials field where the bound volumes will by default be mounted as read-only. Example syntax:

workspaces:
- name: data
credentials:
- name: git # volumeMount will default to readOnly:true
- name: shortlivedtoken
  readOnly: false

Pros

  • Very clear how a volume is intended to be used.
  • Not tied to one specific type of volume.
  • Not "stringly typed".
  • Easy to validate.
  • Structurally similar to existing workspaces feature.

Cons

  • Adding new alternative fields requires API changes.

Infrastructure Needed (optional)

Upgrade & Migration Strategy (optional)

Implementation Pull request(s)

Future Work

  • Expand support for hinting to include validation, fallback behaviour, a broader range of possible "hints" (e.g. minimum persistent volume size) etc.

References (optional)