Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in PVC deletion #7149

Merged
merged 1 commit into from
Oct 3, 2023

Conversation

QuanZhang-William
Copy link
Member

Changes

fixes #7138. In coschedule:pipelineruns mode, the pvcs created by VolumeClaimTemplate should be automatically deleted when the owning pipelinerun is completed. To delete a pvc used by a pod (pipelinerun), the kubernetes.io/pvc-protection finalizer must be removed.

Prior to this commit, the Tekton reconciler attempted to remove the kubernetes.io/pvc-protection finalizer first and then deletes the created pvcs. This results in race condition since PVCProtectionController tries to add back the finalizer if the pvc is in bounded status . If the add back action happens before the PVC deletion call is completed, the PVC will be left in terminating status.

This commit changes the reconciler to delete the PVC first and then removes the finalizer to fix the race condition. This commit also adds integration test to validate pvc status when a pipelinerun is completed.

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Bug fix: delete PVCs created by VolumeClaimTemplates when the owning PipelineRun is completed

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 25, 2023
@tekton-robot tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 25, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/volumeclaim/pvchandler.go 82.6% 54.3% -28.3

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/volumeclaim/pvchandler.go 82.6% 54.3% -28.3

pkg/reconciler/volumeclaim/pvchandler.go Show resolved Hide resolved
@@ -243,35 +242,3 @@ func TestCreateExistPersistentVolumeClaims(t *testing.T) {
t.Fatalf("unexpected PVC name on created PVC; expected: %s got: %s", expectedPVCName, pvcList.Items[0].Name)
}
}

func TestPurgeFinalizerAndDeletePVCForWorkspace(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this test deleted? it should still be valid no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the simple client set cannot mock the pvc with finalizer scenario. Specifically, the pvcs are already successfully deleted in the first API call, so the second call to remove finalizer will fail since the pvc is already deleted.

I could potentially use some expected errors in the test case to compare, but I feel it is quite confusing. And the functionality is already covered by integration test.

Copy link
Member

@dibyom dibyom Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, one way to do this could be to use something like PrependReactor for the delete call and have it set the deletionTimestamp instead of deleting the object

not exactly the same thing, but similar:

c.KubeClientSet.CoreV1().(*fake.FakeCoreV1).PrependReactor("create", "persistentvolumeclaims",
func(action testing2.Action) (handled bool, ret runtime.Object, err error) {
return true, &corev1.PersistentVolumeClaim{}, errors.New("error creating persistentvolumeclaims")
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice, I have added the test back with your suggestion. PTAL!

return pollImmediateWithContext(ctx, func() (bool, error) {
pvcList, err := c.KubeClient.CoreV1().PersistentVolumeClaims(namespace).List(ctx, metav1.ListOptions{})
if err != nil {
return true, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding docString here for the indications of the bool error being returned?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it just follows the expected function signature, and k8s does have docstring for it:

// ConditionFunc returns true if the condition is satisfied, or an error

@JeromeJu
Copy link
Member

JeromeJu commented Oct 2, 2023

/lgtm

@dibyom dibyom assigned dibyom and unassigned JeromeJu Oct 2, 2023
@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 2, 2023
@QuanZhang-William
Copy link
Member Author

/test pull-tekton-pipeline-go-coverage

fixes [tektoncd#7138][tektoncd#7138]. In `coschedule:pipelineruns` mode, the `pvcs` created by `VolumeClaimTemplate` should be automatically deleted when the owning `pipelinerun` is completed.
To delete a `pvc` used by a pod (pipelinerun), the [kubernetes.io/pvc-protection][finalizer] finalizer must be removed.

Prior to this commit, the Tekton reconciler attempted to remove the `kubernetes.io/pvc-protection` finalizer first and then deletes the created pvcs. This results
in race condition since `PVCProtectionController` tries to add back the finalizer if the `pvc` is in `bounded` status . If the add back action happens before the PVC deletion call is completed,
the `PVC` will be left in `terminating` status.

This commit changes the reconciler to delete the `PVC` first and then removes the `finalizer` to fix the race condition.

This commit removes `TestPurgeFinalizerAndDeletePVCForWorkspace` UT, since the finalizer behavior cannot be mocked when a PVC is deleted.
This commit also adds integration test to cover this scenario.

/kind bug

[tektoncd#7138]: tektoncd#7138
[finalizer]: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#storage-object-in-use-protection
@tekton-robot tekton-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 2, 2023
@QuanZhang-William
Copy link
Member Author

@dibyom @JeromeJu . I have further improved unit test, can you please take another look?

@dibyom
Copy link
Member

dibyom commented Oct 3, 2023

Nicely done! thanks @QuanZhang-William
/approve

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dibyom, Yongxuanzhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2023
@Yongxuanzhang
Copy link
Member

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 3, 2023
@tekton-robot tekton-robot merged commit 5a68859 into tektoncd:main Oct 3, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deleting PVCs created by VolumeClaimTemplate is flaky
5 participants