Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate image building and model deployment errors #524

Merged
merged 5 commits into from
Jan 30, 2024

Conversation

ariefrahmansyah
Copy link
Contributor

@ariefrahmansyah ariefrahmansyah commented Jan 26, 2024

Description

Returns more information when failures happen on image building and model deployment process by propagating related kubernetes resources.

For image building, returns:

  1. Kubernetes job conditions
  2. Related pod's container status (in this case pyfunc-image-builder container)
  3. Related pod's last termination message

For model deployment, returns:

  1. Kserve inference service conditions
  2. Related pod's container status
  3. Related pod's last termination message

Screenshots

  • When image building is OOMKilled:
image
  • When docker image for custom model not exist:
image
  • When deployment timeout because no node available:
image

Modifications

  1. Add ParsePodContainerStatuses utility functions
  2. Add related parsing function to parse inference service and kubernetes job.

Tests

Checklist

  • Added PR label
  • Added unit test, integration, and/or e2e tests
  • Tested locally
  • Updated documentation
  • Update Swagger spec if the PR introduce API changes
  • Regenerated Golang and Python client if the PR introduces API changes

Release Notes


@ghost
Copy link

ghost commented Jan 26, 2024

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@ariefrahmansyah ariefrahmansyah added the enhancement New feature or request label Jan 26, 2024
@ariefrahmansyah ariefrahmansyah marked this pull request as ready for review January 26, 2024 09:26
api/utils/kubernetes.go Outdated Show resolved Hide resolved
Copy link
Contributor

@leonlnj leonlnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM! Left some minor comments, but overall lgtm

@ariefrahmansyah ariefrahmansyah merged commit 10e5498 into main Jan 30, 2024
32 checks passed
@ariefrahmansyah ariefrahmansyah deleted the errors-propagation branch January 30, 2024 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants