Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute Resource Deletion Steps When 'Test Samples' Workflow Is Cancelled #1009

Open
kachawla opened this issue Mar 12, 2024 · 2 comments
Open
Assignees
Labels
triaged This item has been triaged by project maintainers and is in the backlog

Comments

@kachawla
Copy link
Contributor

kachawla commented Mar 12, 2024

Currently, if a test samples workflow is manually cancelled, the steps to delete AWS/Azure resources are skipped. For example, here. It's crucial that we complete the cleanup step regardless of how the workflow terminates.

AB#11442

@sylvainsf sylvainsf added the triaged This item has been triaged by project maintainers and is in the backlog label Mar 14, 2024
@ytimocin
Copy link
Contributor

For this one, I added the following checks (https://github.com/radius-project/samples/blob/v0.33/.github/workflows/test.yaml#L337-L351):

  • Delete Azure Resource Group only if Create Azure Resource Group step was successful.
  • Delete AWS Resources only if Deploy App step was successful.
  • Delete EKS Cluster only if Create EKS Cluster step was successful.

If a workflow is cancelled, do we want to make sure the resources are deleted, or do we want to exit the run right away? Because we also have Purge Resources workflow that runs every night to delete all the dangling resources.

What do we want to do when the workflow is cancelled? I'd be okay with both: stopping the run immediately (w/o deleting any resources) or stopping the run after deleting the resources.

Would love to hear from everyone. cc/ @kachawla @willdavsmith @rynowak @youngbupark

@kachawla
Copy link
Contributor Author

Thanks for looking into this @ytimocin.

Delete AWS Resources only if Deploy App step was successful.

This could be problematic for scenarios when the app deployment fails in partially deployed state, where some AWS resources for the app are deployed in which case we would still want to delete those AWS resources. Let me know if this scenario is not possible to happen.

Delete EKS Cluster only if Create EKS Cluster step was successful.

This sounds reasonable as long as failure of create EKS cluster guarantees cleanup of any underlying resources AWS created along the way.

If a workflow is cancelled, do we want to make sure the resources are deleted, or do we want to exit the run right away?

I think we should still run all the cleanup steps even if the workflow was cancelled. Purge resources workflow should ideally only be needed for cleanup of things out of control (RDS snapshots, for example). @willdavsmith thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged This item has been triaged by project maintainers and is in the backlog
Projects
None yet
Development

No branches or pull requests

3 participants