Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up PVCs when canceling populator flows #590

Merged
merged 7 commits into from
Oct 23, 2023

Conversation

liranr23
Copy link
Member

This patch will clean up the PVCs (final and prime) associated with VMs that the migration was cancled for them.

Copy link
Member

@ahadas ahadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move our offline discussion about specific handling for canceled migrations here - why do we need special handling, isn't it a concrete case of a migration that completes (unsuccessfully) before posting a vm?

@liranr23
Copy link
Member Author

liranr23 commented Sep 12, 2023

we can move our offline discussion about specific handling for canceled migrations here - why do we need special handling, isn't it a concrete case of a migration that completes (unsuccessfully) before posting a vm?

yup. it means the vm doesn't exist, there is no owner to the PVCs. it will break a restart tryout and if you just cancel because you don't wish to continue you will have leftovers.

@ahadas
Copy link
Member

ahadas commented Sep 12, 2023

yup. it means the vm doesn't exist, there is no owner to the PVCs.

ok, but how is it different from a migration that fails during disk transfer?

it will break a restart tryout

what's the reason that 'restart' would fail in this case and work for a migration that failed during disk transfer?

and if you just cancel because you don't wish to continue you will have leftovers.

I don't see this as a problem - it's ok to have "leftovers" as long as the migration exists, it's important to remove everything when the plan is removed

@liranr23
Copy link
Member Author

liranr23 commented Sep 13, 2023

yup. it means the vm doesn't exist, there is no owner to the PVCs.

ok, but how is it different from a migration that fails during disk transfer?

it doesn't.

it will break a restart tryout

what's the reason that 'restart' would fail in this case and work for a migration that failed during disk transfer?

i need to check it. but if we have the PVC and try to recreate them i think we will fail. migration that failed can have the same problem.

and if you just cancel because you don't wish to continue you will have leftovers.

I don't see this as a problem - it's ok to have "leftovers" as long as the migration exists, it's important to remove everything when the plan is removed

you suggest having it on every cleanup flow? e.g based on the PVC has ownership?

@ahadas
Copy link
Member

ahadas commented Sep 13, 2023

I don't see this as a problem - it's ok to have "leftovers" as long as the migration exists, it's important to remove everything when the plan is removed

you suggest having it on every cleanup flow? e.g based on the PVC has ownership?

yeah, assuming that by "cleanup flow" you refer to what we do before restarting a plan or during archive operation

@liranr23 liranr23 force-pushed the cleanup_pvcs_cancel branch from dd54aa5 to 3597547 Compare October 2, 2023 13:21
@liranr23
Copy link
Member Author

liranr23 commented Oct 2, 2023

I don't see this as a problem - it's ok to have "leftovers" as long as the migration exists, it's important to remove everything when the plan is removed

you suggest having it on every cleanup flow? e.g based on the PVC has ownership?

yeah, assuming that by "cleanup flow" you refer to what we do before restarting a plan or during archive operation

Done.
Overall the main issue is that RestartLimitExceeded function doesn't work on the populator pod. The pod is terminated and not restarting, causing the limitation to be useless. If that would work we could know that the plan failed without looping "forever".
Currently the populator controller creates the pod with RestartPolicyNever, but maybe we wish to change it into RestartPolicyOnFailure?

@liranr23 liranr23 force-pushed the cleanup_pvcs_cancel branch from 3597547 to a0ed5e5 Compare October 2, 2023 14:11
@ahadas ahadas added this to the 2.6.0 milestone Oct 16, 2023
Copy link
Member

@ahadas ahadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes look good, minor comments inside
about restarting the populator pod on failure - yeah, it makes sense to align the populator pods with importer pods, but that's a separate task

pkg/controller/plan/migration.go Outdated Show resolved Hide resolved
pkg/controller/plan/kubevirt.go Outdated Show resolved Hide resolved
@liranr23 liranr23 force-pushed the cleanup_pvcs_cancel branch from a0ed5e5 to cbb441b Compare October 17, 2023 15:10
@ahadas ahadas removed this from the 2.6.0 milestone Oct 18, 2023
@liranr23 liranr23 force-pushed the cleanup_pvcs_cancel branch from cbb441b to 1063019 Compare October 18, 2023 11:20
This patch will clean up the PVCs (final and prime) associated with VMs
that the migration was cancled for them.

Signed-off-by: Liran Rotenberg <[email protected]>
@liranr23 liranr23 force-pushed the cleanup_pvcs_cancel branch from 1063019 to 12d417b Compare October 18, 2023 14:55
@ahadas ahadas force-pushed the cleanup_pvcs_cancel branch 5 times, most recently from c5884f9 to 4d9fc2a Compare October 19, 2023 08:37
To return an error when we can't get a prime PVC rather than continue to
the next PVC.

Also refactored the code a bit to reduce cognitive complexity.

Signed-off-by: Arik Hadas <[email protected]>
@ahadas ahadas force-pushed the cleanup_pvcs_cancel branch from 4d9fc2a to ef67399 Compare October 19, 2023 08:49
Renamed it to DeletePopulatedPVCs and extract the sections that remove
the corresponding prime PVC and the patched PVC to separate functions.

Also rename Migration#cleanUpPopulatorPVCs to delete PopulatorPVCs.

Signed-off-by: Arik Hadas <[email protected]>
@ahadas ahadas force-pushed the cleanup_pvcs_cancel branch from 9571e3d to 26993cb Compare October 22, 2023 15:15
@ahadas ahadas force-pushed the cleanup_pvcs_cancel branch from 663db1e to 6e3e8a7 Compare October 22, 2023 15:31
This script can be used to clean up vendored code from security issues,
here we remove two unused Dockerfiles that have some security issues.

Signed-off-by: Arik Hadas <[email protected]>
@ahadas ahadas force-pushed the cleanup_pvcs_cancel branch from d268912 to b3e2119 Compare October 22, 2023 15:46
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.5% 0.5% Duplication

@ahadas ahadas merged commit cf8277d into kubev2v:main Oct 23, 2023
8 checks passed
@liranr23 liranr23 deleted the cleanup_pvcs_cancel branch October 23, 2023 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants