-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to remove items from completed builds even if supposedly already cancelled #259
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
b0aeaca
Try to remove items from completed builds even if supposedly already …
jglick 92bf46b
Merge branch 'master' into stopping
jglick f72d3ea
Logging an extra warning in cases where this PR is active https://git…
jglick 16915df
Merge branch 'master' into stopping
jglick File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK about this. It was only ever intended to run once after a Jenkins restart (and Damien says that in the current case ci.jenkins.io has not yet restarted), so I am worried that removing the guard may lead to log spam or further confusion in some cases. Maybe instead we could add a call to
Timer.execute
afterworkflow-durable-task-step-plugin/src/main/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecution.java
Line 148 in f256ed3
I think it would be harmless to test this new code, but if ci.jenkins.io really has not yet restarted, I would be interested to see if the previous code is enough to cancel the task after a restart or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I'm sorry, our comments crossed: #259 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also concerned about this. But @dduportal said that the script (which mimics what this PR would do) indeed canceled the bogus items successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it is fine in this case, it seems clearer and safer to me to trigger an explicit delayed cancellation when the original cancellation attempt fails rather than have
getCauseOfBlockage
potentially cancel things more than once. For example, if theTimer
thread pool is somewhat congested and the task does not run for more than 5 seconds, I think a second task could be scheduled with the proposed code (not really a big deal).Either way, at this point it seems that the issue seen on ci.jenkins.io happens while the controller is running rather than being a result of some problem with persisted state after a restart, which I did not know was possible and is distinct from what I was trying to fix when I filed #185. Ideally we would try to reproduce the new issue in a test to have a chance at fixing the underlying cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was also my understanding.
My reservation is that we do not understand the cause of the original failure to cancel, so we cannot know whether that would actually work. The idea with this PR is to be a final fallback in case other things failed for unknown reasons, to make sure the queue items do not remain there indefinitely.
Yes of course that would be best, but I have no clear notion of what happened.
BlockedItem.leave
not working? One case appeared to relate to a Dependabot PR being closed, but that does nothing to explain whyQueue.cancel
would fail.Not sure what to do at this point other than see how things behave on ci.jenkins.io.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been paying attention too closely but I think I've noticed this on ci.jenkins.io when a PR build is aborted due to a new build starting, i.e.:
disableConcurrentBuilds(abortPrevious: true)
the first PR build is abortedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that would be a common trigger for
workflow-durable-task-step-plugin/src/main/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecution.java
Lines 144 to 149 in a62ccc9
cancel
would be returning false (as logs confirm) and the item left in the queue.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we do not really know whether this PR works yet either. As far as we know, the existing code would have fixed the issue when Damien restarted ci.jenkins.io, and we have no way to tell whether the new case is being hit. I think we should at least use a distinct log message for the case where
stopping
is alreadytrue
so that we would know for sure when it applies and since "Refusing to build ..." does not really make sense as an explanation in this scenario anyway (more like "Desperately trying to cancel ...").Maybe we should use both approaches, at least temporarily, with a distinct log message in each case, just to see what happens. Knowing whether an asynchronous cancellation right after the synchronous cancellation in
stop
works or not could be a useful clue to anyone who ends up trying to track down the root cause in the future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly, but analogous calls made from
/script
did work, so we have reason to believe it would have.Yes it probably would have, but we do not want to require admins to restart the server just to clean up items we know are invalid.
Could try that. @dduportal do you believe this issue happens enough that you would be likely to see it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jenkins-infra/helpdesk#3211 (comment)