Skip to content

Commit

Permalink
ci: Send team alerts on specific keywords (#10986)
Browse files Browse the repository at this point in the history
* ci: Send team alerts on specific keywords

Signed-off-by: Oliver Koenig <[email protected]>

* f

Signed-off-by: Oliver Koenig <[email protected]>

---------

Signed-off-by: Oliver Koenig <[email protected]>
  • Loading branch information
ko3n1g authored Oct 27, 2024
1 parent 8fcd5cb commit 0c5f5fb
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 5 deletions.
7 changes: 7 additions & 0 deletions .github/workflows/_test_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,17 @@ on:
log:
description: Last 2000 characters of the test step's log
value: ${{ jobs.main.outputs.log }}
potential_infra_failure:
description: Boolean flag when infra-related keyword spotted in logs.
value: ${{ jobs.main.outputs.potential_infra_failure }}
jobs:

main:
runs-on: ${{ inputs.RUNNER }}
outputs:
conclusion: ${{ steps.main.conclusion }}
log: ${{ steps.main.outputs.log }}
potential_infra_failure: ${{ steps.main.outputs.potential_infra_failure }}
steps:
- name: Docker system cleanup
run: |
Expand Down Expand Up @@ -75,6 +79,9 @@ jobs:
echo "log=$(tail -c 2000 err.log | base64 -w 0)" >> "$GITHUB_OUTPUT"
potential_infra_failure=$(cat err.log | grep -Eqi "gpu|cuda|device" && echo true || echo false)
echo "potential_infra_failure=$potential_infra_failure" >> "$GITHUB_OUTPUT"
exit $EXIT_CODE
- uses: "NVIDIA/NeMo/.github/actions/cancel-workflow@main"
Expand Down
14 changes: 9 additions & 5 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4515,7 +4515,10 @@ jobs:
if: ${{ always() && steps.pipeline-conclusion.outputs.FAILED == 'true' && env.SLACK_WEBHOOK != '' }}
env:
SLACK_WEBHOOK: ${{ secrets.SLACK_WEBHOOK }}
SLACK_WEBHOOK_ADMIN: <!subteam^${{ secrets.SLACK_WEBHOOK_ADMIN }}>
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_ACTOR: ${{ github.actor }}
BRANCH: ${{ github.head_ref || github.ref_name }}
REPOSITORY: ${{ github.repository }}
RUN_ID: ${{ github.run_id }}
PR_NUMBER: ${{ github.event.number }}
Expand Down Expand Up @@ -4571,13 +4574,15 @@ jobs:
echo "* [$JOB_NAME]($JOB_URL)" | tee -a $GITHUB_STEP_SUMMARY
LOGS=$(echo $JOB | yq '(.value.outputs.log | @base64d)' | tr -d '"')
LOGS=$([[ $(echo $LOGS | wc -c) -gt 0 ]] && echo -E "\`\`\`\n$LOGS\n\`\`\`" || echo "")
LOGS=$([[ $(echo $JOB | yq '.value.outputs.potential_infra_failure') == "true" ]] && echo -E "$LOGS\n\ncc: $SLACK_WEBHOOK_ADMIN" || echo -E "$LOGS")
SUMMARY=$(echo "$SUMMARY" | jq \
--arg pr "<$PR_URL|$PR_TITLE>" \
--arg job "<$JOB_URL|$JOB_NAME>" \
--arg logs "$LOGS" \
--arg author "<https://github.com/${{ github.actor }}|${{ github.actor }}>" \
--arg branch "<https://github.com/$REPOSITORY/tree/${{ github.head_ref || github.ref_name }}|${{ github.head_ref || github.ref_name }}>"\
--arg logs "$(echo -e "$LOGS")" \
--arg author "<https://github.com/$GITHUB_ACTOR|$GITHUB_ACTOR>" \
--arg branch "<https://github.com/$REPOSITORY/tree/$BRANCH|$BRANCH>"\
'. += [
{
"type": "section",
Expand All @@ -4588,8 +4593,7 @@ jobs:
+ "\nJob: " + $job
+ "\nAuthor: " + $author
+ "\nBranch: " + $branch
+ "\nLogs:"
+ "```\n" + $logs + "\n```"
+ "\nLogs:" + $logs
)
}
}
Expand Down

0 comments on commit 0c5f5fb

Please sign in to comment.