-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
signal handling: User-defined interrupt handlers #49541
base: master
Are you sure you want to change the base?
Conversation
For
would register |
cb46a12
to
ef6a037
Compare
8a9b460
to
c969a35
Compare
To make this feature more useful, I've put together a small package, InterruptHandling.jl. This package registers an interrupt handler which, on interrupt, shows a TerminalMenus prompt with a list of libraries/modules to interrupt; it additionally provides the ability for libraries to register their own handlers with InterruptHandling so that they can be selectively interrupted from the prompt. It's my hope that we can standardize on this package or another like it for a number of reasons:
I would love to know what people think about this approach! 😄 EDIT: This functionality is now built-in to this PR when running in the REPL |
I wanted to test how this behaves on multiple interrupts, one after another, but the package does not appear to work on my end:
|
@Seelengrab sorry, I really need to add some docs; you use |
Also, after thinking on this, I realized that we'd probably need InterruptHandling.jl to become part of Base, because otherwise Base and stdlibs wouldn't be able to use it (which I think is important). It only depends on TerminalMenus, although we could also provide a non-TerminalMenus fallback if we don't want a hard dependency (in the event we want to move TerminalMenus out of the sysimage). EDIT: Implemented in latest push for REPL sessions, with an interrupt-all behavior for non-REPL sessions |
Another suggested feature for discussion: an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think people might care about force-deliver-sigint remaining available, particularly since this usually delays handling much more?
Some issues/PRs I stumbled across while looking for something different, that seem related/solved by this: |
@vtjnash What does it take to get this done? |
I have the integration of the TerminalMenus-based handler working fine locally, but I still need to:
And then I think this will be good-to-go, if people are OK with it. |
c969a35
to
e76c157
Compare
The latest push adds the fancy TerminalMenus-based interrupt handler for REPL sessions, and moves multi-handler registration into Base (instead of doing this in the runtime). We also now have interrupt-all-handlers behavior when not running the REPL, to which I am planning to extend with something like a force-interrupt on rapid Ctrl-C in succession (like our current force behavior, but quicker to trigger). There's just a few things to address (see the Todo list in the first comment in this PR), but I think this is just about ready to go! |
e76c157
to
aa243ed
Compare
|
||
const INTERRUPT_HANDLERS_LOCK = Threads.ReentrantLock() | ||
const INTERRUPT_HANDLERS = Dict{Module,Vector{Task}}() | ||
const INTERRUPT_HANDLER_RUNNING = Threads.Atomic{Bool}(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't these atomics deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you use instead? The @atomics
interface doesn't provide an alternative for a single but entirely atomically accessed resource, as far as I'm aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can turn it into a struct, but there's only ever one instance of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make it just a global, but currently those don't support custom atomic annotations (they are always just release/consume)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "clean" solution would be to have a struct with @atomic
fields, stored in a global Ref
. The constructor would check if the Ref
is already assigned and refuse being instantiated otherwise.
4013d97
to
2a087a7
Compare
Would people with access to Windows and Mac systems mind helping figure out what's wrong with those sides of the implementation? I have it working quite nicely locally on Linux, but I don't know how best to debug the hangs in CI in other platforms. |
Interrupt handling is a tricky problem, not just in terms of implementation, but in terms of desired behavior: when an interrupt is received, which code should handle it? Julia's current answer to this is effectively to throw an `InterruptException` to the first task to hit a safepoint. While this seems sensible (the code that's running gets interrupted), it only really works for very basic numerical code. In the case that multiple tasks are running concurrently, or when try-catch handlers are registered, this system breaks down, and results in unpredictable behavior. This unpredictable behavior includes: - Interrupting background/runtime tasks which don't want to be interrupted, as they do little bits of important work (and are critical to library runtime functionality) - Interrupting only one task, when multiple coordinating tasks would want to receive the interrupt to safely terminate a computation - Interrupting only one library's task, when multiple libraries really would want to be notified about the interrupt The above behavior makes it nearly impossible to provide reliable Ctrl-C behavior, and results in very confused users who get stuck hitting Ctrl-C continuously, sometimes getting caught in a hang, sometimes triggering unrelated exception handling code they didn't mean to, sometimes getting a segfault, and very rarely getting the behavior they desire (with unpredictable safety of being able to continue using the active session as intended). This commit provides an alternative behavior for interrupts which is more predictable: user code may now register tasks as "interrupt handlers" (via `Base.register_interrupt_handler`), which will be guaranteed to receive an `InterruptException` whenever the session receives an interrupt signal. Additionally, unlike the previous behavior, no other tasks will receive `InterruptException`s; only explicitly registered handlers may receive them. This behavior allows one or more libraries to register handler tasks which will all be concurrently awoken to handle each interrupt and do whatever is necessary to safely interrupt any running code; the extent to which other tasks are interrupted is arbitrary and library-defined. For example, GPU libraries like AMDGPU.jl can register a handler to safely interrupt GPU kernels running on all GPU queues and do resource cleanup. Concurrently, a complex runtime like the scheduler in Dagger.jl can register a handler to interrupt running tasks on other workers when possible. This commit also adds a more convenient interface for when the REPL is running. When a Ctrl-C is received and the user is not at the REPL prompt, a TerminalMenus-powered prompt will be shown, where the user will have a variety of possible actions, including: - Ignore the interrupt (do nothing) - Activate all module's interrupt handlers - Activate a specific module's interrupt handlers - Disable the interrupt handler (reverting to the old Ctrl-C behavior) - Exit Julia gracefully (with `exit()`) - Exit Julia forcefully (with a `ccall` to `abort`)
2a087a7
to
9ddbf8b
Compare
Thanks so much for your work here @jpsamaroo! I was sent here from fonsp/Pluto.jl#452, and I took an interest in the progress of this pull request. I hope you don't mind my sharing this, but I just made a list of the commits on this branch to help myself track the history of this patch's development. This was called for because of the liberal use of rebases and force pushes on the branch. Hopefully this is useful to you and/or to anyone else who's interested in following the development of this feature. Once again, thank you so much for your hard work; I'm eager to see this merged into
|
On a related note, likely apropos #6283, I have had a need for cancellable In some cases, these channels represent input from external systems (hardware), so a task (A) might be blocked on I was expecting https://github.com/davidanthoff/CancellationTokens.jl to be part of the solution (cf. davidanthoff/CancellationTokens.jl#14): It might be an idea to also peek at the .NET cancellation framework. |
This PR is the first necessary step towards robust cancellation - without a safe, predictable interrupt mechanism, there is no way to reliably chain interrupts to cancellation behaviors. Any attempt to implement "good" cancellation support in the language will require this PR, or something like it, to build upon. |
Thanks - looking forward to this landing! 🚀 I will probably be short on time, but let me know if further testing on macOS or Windows is needed. |
I would really appreciate both testing and fixes for those platforms - I don't have easy access to them, and don't understand how they handle signals. Implementing an equivalent |
OK - looking into the Windows-part by building on this branch and the work did by @jakubwro in jakubwro/interrupt-handlers (in jakubwro@265eff0). Disclaimer: I have zero experience with implementing handling of Windows signals :-) The signal handling for Windows is a bit confusing when comparing to the signal handling for Unix (and Mach):
I guess those two signal handlers should then call "an equivalent Not sure how to implement And perhaps Seemingly relevant information
Cf. http://darmawan-salihun.blogspot.com/2017/05/signal-handling-in-windows-console.html?m=1
Cf. https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signal?view=msvc-170#remarks |
Pushed a WIP to Struggled a bit with just compiling (with msys2 provided mingw), but #51682 and workaround from #51740 seems to have helped. |
It might work...? On Windows, using stemann@7c37470, running
|
That analysis reminds me that packages have had the option of opting into safe signal handling with |
This is really interesting, I understand idea here is to run all |
Yes, roughly so. Though note that you should not be using |
Right, I was able to reproduce this crash. My idea was to interrupt REPL when ran in interactive session, and do actual cleanup when ran as app without REPL. Still not sure how to do this REPL interrupt correctly. |
Right. One of the main distinctions of this PR is that it effectively sets disable_sigint on all others tasks, except the ones that are given the task of handling the signal |
I did more experimenting with Line 2735 in 29d78fa
|
Issue flag: when using this updated version of Julia and after sending signal interrupts, REPL continuously spawns |
Interrupt handling is a tricky problem, not just in terms of implementation, but in terms of desired behavior: when an interrupt is received, which code should handle it? Julia's current answer to this is effectively to throw an
InterruptException
to the first task to hit a safepoint. While this seems sensible (the code that's running gets interrupted), it only really works for very basic numerical code. In the case that multiple tasks are running concurrently, or when try-catch handlers are registered, this system breaks down, and results in unpredictable behavior. This unpredictable behavior includes:The above behavior makes it nearly impossible to provide reliable Ctrl-C behavior, and results in very confused users who get stuck hitting Ctrl-C continuously, sometimes getting caught in a hang, sometimes triggering unrelated exception handling code they didn't mean to, sometimes getting a segfault, and very rarely getting the behavior they desire (with unpredictable safety of being able to continue using the active session as intended).
This commit provides an alternative behavior for interrupts which is more predictable: user code may now register tasks as "interrupt handlers", which will be guaranteed to receive an
InterruptException
whenever the session receives an interrupt signal. Additionally, when any interrupt handlers are registered, no other tasks will receiveInterruptException
s; only the handlers may receive them.This behavior allows one or more libraries to register handler tasks which will all be concurrently awoken to handle each interrupt and do whatever is necessary to safely interrupt any running code; the extent to which other tasks are interrupted is library-defined. For example, GPU libraries like AMDGPU.jl can register a handler to safely interrupt GPU kernels running on all GPU queues and do resource cleanup. Concurrently, a complex runtime like the scheduler in Dagger.jl can register a handler to interrupt running tasks on other workers when possible.
Todo:
Consider whether to allowNot doing this for nowforce
mode to still operate with handlers registeredAddPunting as we don't have a way to register the handler during@interrupthandler
API (suggested by @Seelengrab )__init__
jl_task_get_next
Fixes #34184
Fixes #19222