oci/int: Add separate resource cleanup step #712
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduce a destroy-only mode in the integration test runner to run terraform destroy for the respective cloud provider configurations. This can be used to destroy cloud resources without going through the whole provision-test process.
Add a new step in github actions workflow to run the test binary in destoy-only mode at the very end irrespective of the result of the previous steps. This ensures that the infrastructure is always destroyed, even if the CI job is cancelled.
Refer https://docs.github.com/en/actions/learn-github-actions/expressions#always . This can also help ensure everything is cleaned up when a CI job is cancelled as the cleanup step will run always.
This change doesn't use the
Options
fromtest-infra/tftestenv
for thedestroy-only
flag as that has not been adopted by the current test setup yet.This is added to solve a recent CI failure due to a failure in GCP which resulted in the cluster provisioning to take more than 30 minutes, which is the test timeout duration. After the timeout, the test binary got terminated and couldn't perform graceful cleanup. To work around such scenarios, the cleanup can be run separately at the end with its own timeout to not affect the test runtime.
Refer: https://github.com/fluxcd/pkg/actions/runs/7409263553/job/20159178143#step:13:338
Example test run: https://github.com/fluxcd/pkg/actions/runs/7423829019/job/20202121975#step:14:22
Another example run where cancelled job runs the cleanup: https://github.com/fluxcd/pkg/actions/runs/7424467088/job/20204287128#step:13:59
But this is still subject to the state being written in terraform state file. If the job is cancelled too early, before it gets written in terraform state file, the cleanup won't be able to detect and delete the resources.