Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output AWS Batch job ID from pathogen-repo-build worklfow #79

Closed
joverlee521 opened this issue Apr 5, 2024 · 5 comments · Fixed by #101
Closed

Output AWS Batch job ID from pathogen-repo-build worklfow #79

joverlee521 opened this issue Apr 5, 2024 · 5 comments · Fixed by #101
Assignees
Labels
enhancement New feature or request

Comments

@joverlee521
Copy link
Contributor

If downstream jobs/workflows require the AWS Batch job ID of the build, we need to add an output to the pathogen-repo-build workflow. This idea was prompted by nextstrain/seasonal-flu#159. I'm not able to think of any other use case that would require the AWS Batch job ID output...

I don't think it's too much work to add/maintain the output but will revisit this another time.

@joverlee521 joverlee521 added the enhancement New feature or request label Apr 5, 2024
@jameshadfield
Copy link
Member

jameshadfield commented Apr 7, 2024

This seems very similar to nextstrain/zika#52 where we added a job to the workflow as a workaround because we couldn't pass information from one job to another. But artifacts (for files) or outputs (for strings) should let us do this, right?

Or have I misunderstood and the technical difficulty relates to how the job which calls nextstrain build would itself get this information out of the build?

@joverlee521
Copy link
Contributor Author

Ah, this is a technical bit in the GitHub Action workflow. Now that I'm looking at it again, we already store the AWS Batch job ID as an output from the run-build job.

outputs:
AWS_BATCH_JOB_ID: ${{ env.AWS_BATCH_JOB_ID }}

We just need to make it an output of the reusable workflow.

@jameshadfield
Copy link
Member

Right, and its value is obtained by parsing the build output:

echo "AWS_BATCH_JOB_ID=$(sed -nE 's/.+AWS Batch Job ID\:.+ ([-a-f0-9]+)$/\1/p' < "$NEXTSTRAIN_BUILD_LOG")" | tee -a "$GITHUB_ENV"

This approach is what I'm suggesting we aim for with regards to surfacing relevant information from the reusable workflow. In other words, we design some structured text pattern which (ingest/pathogen) workflows can print and the reusable workflow will parse this from the logs and surface via an output. For instance, we could use this to know whether there were "new data" or "no new data" in an ingest.

@tsibley
Copy link
Member

tsibley commented May 14, 2024

In other words, we design some structured text pattern which (ingest/pathogen) workflows can print and the reusable workflow will parse this from the logs and surface via an output.

Why not produce a file instead of pattern matching on log output?

@jameshadfield
Copy link
Member

Why not produce a file instead of pattern matching on log output?

👍 I should have said "design some structure" -- a file would be just fine.

joverlee521 added a commit that referenced this issue Jun 18, 2024
If the workflow is run as a reusable workflow with the `aws-batch`
runtime, the workflow will output the AWS Batch job id.

This is useful for re-attaching to a complete AWS Batch job to
download the results and use them in subsequent jobs.

Resolves #79
@joverlee521 joverlee521 self-assigned this Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants