-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backup is Triggered with same backupNumber #962
Comments
Same thing is true for restoring the backup... btw... we see two consecutive runs
Why is this message triggered twice? and why does reconcilliation continue without waiting for the restore to finish? We have confirmed on the pod/container that there are two instances of restore.sh running... which seems not the way it should work |
this is odd, can you send me the ps -ef inside the backup container? I never saw this issue with few seconds between 2 backups. |
Thx for taking a look into it... Here is the output of 'ps -ef'.
and the crd scripts are adapted and mounted from our configmap. It might be worth mentioning that we have PROD and TEST running in the very same cluster on different namespaces... (operator is duplicated as well and each one only watching its own namespace) |
This is clearly a new instance of the operator that is trying to restore the same backup, we may need to add a more safe mechanism that will avoid the operator to run again the restore command and the backup to avoid to start if another is running. |
We've also seen duplicated restores, but in this case it's every time on every instance and they appear to be consecutive, not concurrent. This PR fixed it for us. After giving it a second look, I think the backup issue occurs if jenkins-operator restarts. Similar to the restore issue I discovered, the |
Thanks a lot for the restore fix!
Honestly based on the operator code I'm more incline to add this logic to the backup script, since the original authors of this operator make the operator to be agnostic of the type and logic of the backup. |
I started the work on this PR you can test it with this temporary image: |
Ok I think that now is stable to test: |
The new 0.8.1 should fix this issue, let me know if it's not like that, drop a comment and I will re-open the issue. |
Due to the size of the backup we only trigger a backup every two hours (interval: 7200).
For some - yet unknown - reasons in some cases the backup is being triggered twice in a very short time. Our backup script by now detects that problem and will exit (without error).
Find an excerpt from the logs below.
backups 1284 and 1287 did work according to expectations.
backups 1285, 1286, 1288 und 1289 have been run twice
We have already slightly adapted the backup and restore scripts in order to trace the issues. A crucial part might be the proper way of emitting and handling signals in the script.
used backup script
The text was updated successfully, but these errors were encountered: