-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for longer transfers without a constant connection to the pod/k8s api #71
Comments
Thanks for the feature suggestion. I see the problem, and indeed, it makes sense to address it. I agree with your suggested solution of a detach mode with a Something like
About the robust connection mechanism, I agree - makes total sense. However, there can be various failure scenarios that require a different way of handling - some require a reconnect, and others might need jumping to the next strategy. For example, a connection failure can be recovered with a retry. Differentiating between some "common" errors like a connection timeout and handling them specially might be a good start. I am open for ideas here - let me know if you have any. If you are interested in submitting PRs for any of these, please let me know so I don't start working on them already :) |
Thanks for the response - I like the phrasing of the '--detach' flag a lot - I was struggling with the wording, hence "fire-and-forget", haha. I've not yet taken a look through the code (I'm sure it's fairly simple though!), but yeah I'd be happy to start working on the I'm not yet aware of all the details of when and why you might need to jump to the next strategy that couldn't be detected up-front. From what I can see so far, the logic for choosing is:
I guess that production k8s deployments with complex internal routing & firewalling might have communication blocked between certain namespaces/pods - so you'd need to fallback to lbsvc in those circumstances, and detect them by the rsync pod failing? |
Thanks, no need to rush on the feature - take your time :) The logic works like the following:
I chose to implement it this way with 2 separate reasons of fallback - for "can't do" and "error" situations, because there can be reasons for a migration to fail that I cannot predict. Like you said, NetworkPolicies can be one reason. |
Yep - that makes sense. I wonder if we could detect if a transfer began successfully - if it began I would think we ought to be able to assume that if it can finish, it can finish using the current strategy? Parsing the logs output is already happening (I assume) given the progress bar's existence - so perhaps detect the first finished file? I also thought about running a test transfer between the PVs and going with the first strategy that completes a test transfer successfully without a fallback mechanism - unsure about that if you want to defend against things changing mid-transfer though. Would you like to keep this issue for discussing improvements to the fallback mechanism, and I can open a second ticket for the |
I think we can proceed on this current ticket. Thought about this a bit and I think something like this would work:
What do you think? We can iterate on it as we go forward. |
Is your feature request related to a problem? Please describe.
I triggered a migration from my laptop using pv-migrate to migrate a large amount of data (1.5TB) into my cluster-managed storage. Due to the length of time the migration took, my laptop went in-and-out of connectivity and ended up cycling through all the migration strategies.
Describe the solution you'd like
I can see two not mutually-exclusive solutions here:
The text was updated successfully, but these errors were encountered: