Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtaining main outputs from WDL runners other than Cromwell? #450

Open
adamnovak opened this issue Sep 17, 2024 · 0 comments
Open

Obtaining main outputs from WDL runners other than Cromwell? #450

adamnovak opened this issue Sep 17, 2024 · 0 comments

Comments

@adamnovak
Copy link

Describe the bug

We're running into a problem where the top-level atac.wdldeclares a set of outputs that doesn't include the main output files of the workflow. The main workflow claims that the only outputs are a couple of QC files:

atac-seq-pipeline/atac.wdl

Lines 1836 to 1840 in 47ba8df

output {
File report = qc_report.report
File qc_json = qc_report.qc_json
Boolean qc_json_ref_match = qc_report.qc_json_ref_match
}

The documented way to actually get at the outputs the user probably wanted is to use croo, but croo requires a Cromwell-specific metadata JSON, and access to storage where the WDL engine, assumed to be Cromwell, has saved all the outputs of all the individual tasks in the workflow.

This is fine if this workflow is not meant to be standard WDL and is only ever meant to run on Cromwell (or only ever meant to run on caper). But the README documents running it on other WDL engines such as DNANexus (which I don't think actually runs Cromwell), and there's a mention in the commit history of fixing up the WDL syntax so that it is readable by MiniWDL, which suggest that the intention is to be compatible with WDL engines other than Cromwell.

The WDL 1.0 specification doesn't seem to have a concept of a workflow run metadata JSON, nor does it say that a runner is expected to preserve individual task outputs. In fact, the spec says:

Omitting Workflow Outputs

If the output {...} section is omitted from a top-level workf[l]ow then the workflow engine should include all outputs from all calls in its final output.

Which I interpret to also mean that if the top-level workflow does have an output section, like atac.wdl does, then the workflow engine is not expected to include outputs from the individual calls in what it returns to the user.

At our site, we like to use our Toil WDL engine to run workflows, which follows the spec in that it only makes individual task outputs available to the user if the top-level workflow doesn't define its own output section. We're having trouble with this resulting in all the actual atac.wdl output files that the user wants getting thrown away, since the workflow says they aren't workflow outputs.

Can the output section of atac.wdl be expanded to include all the outputs the user is presumed to want access to, or else removed to indicate that the user is expected to want access to all individual task outputs?

Is this project actually targeting the general WDL 1.0 engine, or only Cromwell specifically?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant