Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with SIGINT #19222

Closed
n3f4s opened this issue Nov 4, 2016 · 12 comments · May be fixed by #49541
Closed

Segfault with SIGINT #19222

n3f4s opened this issue Nov 4, 2016 · 12 comments · May be fixed by #49541
Labels
error handling Handling of exceptions by Julia or the user

Comments

@n3f4s
Copy link

n3f4s commented Nov 4, 2016

Hi,
When I send a SIGINT to julia (kill <julia's pid> for the REPL or CTRL+C or kill ... for a script), there is a segfault:

julia>
### kill <julia's pid> ####
signal (15): Complété
while loading no file, in expression starting on line 0
__kernel_vsyscall at linux-gate.so.1 (unknown line)
syscall at /usr/bin/../lib/libc.so.6 (unknown line)
unknown function (ip: 0xb61bae38)
unknown function (ip: 0xb61b9009)
uv_run at /usr/bin/../lib/libjulia.so.0.5 (unknown line)
jl_run_once at /usr/bin/../lib/libjulia.so.0.5 (unknown line)
unknown function (ip: 0xb359d4ab)
unknown function (ip: 0xb359da59)
unknown function (ip: 0xb359e2b0)
unknown function (ip: 0xb35a70a1)
unknown function (ip: 0xb35a724c)
unknown function (ip: 0xb35a76b5)
unknown function (ip: 0xb35a783d)
jl_apply_generic at /usr/bin/../lib/libjulia.so.0.5 (unknown line)
unknown function (ip: 0xb35a6679)
unknown function (ip: 0xb35a66b6)
jl_apply_generic at /usr/bin/../lib/libjulia.so.0.5 (unknown line)
unknown function (ip: 0xb35a8c20)
unknown function (ip: 0xb35aad2b)
unknown function (ip: 0xb35aae66)
jl_apply_generic at /usr/bin/../lib/libjulia.so.0.5 (unknown line)
unknown function (ip: 0xb35b42b2)
run_repl at ./REPL.jl:188
unknown function (ip: 0x89907432)
jl_apply_generic at /usr/bin/../lib/libjulia.so.0.5 (unknown line)
unknown function (ip: 0xb35c81a0)
unknown function (ip: 0xb35c83a5)
jl_apply_generic at /usr/bin/../lib/libjulia.so.0.5 (unknown line)
unknown function (ip: 0x80492f8)
unknown function (ip: 0x8048d60)
__libc_start_main at /usr/bin/../lib/libc.so.6 (unknown line)
unknown function (ip: 0x8048dab)
unknown function (ip: 0xffffffff)
Allocations: 1005443 (Pool: 1004674; Big: 769); GC: 0

signal (11): Erreur de segmentation
while loading no file, in expression starting on line 0

I'm on manjaro linux (distribution based on arch linux) and I'm using julia 0.5.0 (version on pacman on manjaro).

I will try with a debug build built from the github repository as soon as possible.

[EDIT]: add error message.

@yuyichao
Copy link
Contributor

yuyichao commented Nov 4, 2016

This isn't really fixable

@n3f4s
Copy link
Author

n3f4s commented Nov 4, 2016

why is it not fixable ?

@yuyichao
Copy link
Contributor

yuyichao commented Nov 4, 2016

(Also, kill pid does not send SIGINT)

It is not fixable without either not guaranteeing exit on signal, or not doing any cleanup on signal.

@n3f4s
Copy link
Author

n3f4s commented Nov 4, 2016

(yes, my bad, I never get the default signal right for kill)

Is it not possible to do something like in python and ruby: throw an exception when receiving a "stop" (INT/TERM...) signal ? Cleanup would be deferred to the exception handling process if the exception isn't caught.

@yuyichao
Copy link
Contributor

yuyichao commented Nov 4, 2016

Is it not possible to do something like in python and ruby: throw an exception when receiving a "stop" (INT/TERM...) signal ?

We do that in the repl for SIGINT (or when you turn off exit_on_sigint). Ctrl-C in REPL shouldn't segfault.

@yuyichao
Copy link
Contributor

yuyichao commented Nov 4, 2016

And that's what I mean by not guaranteeing exit. The process can wait for a arbitrarily long time before the exception is delivered.

@n3f4s
Copy link
Author

n3f4s commented Nov 4, 2016

We do that in the repl for SIGINT (or when you turn off exit_on_sigint). Ctrl-C in REPL shouldn't segfault.

Why not do the same for SIGTERM ?
(CTRL-C don't work with Pkg.clone (Pkg.clone and bad connection made me kill julia and find the segfault) but it not sure it's not off topic).

And that's what I mean by not guaranteeing exit. The process can wait for a arbitrarily long time before the exception is delivered.

"Transforming" the signal into exception seems to work fine for ruby and python (I never had any problem with CTRL-C in those languages). What could delay the exception ?
(I'm curious, I'm not doubting what you are saying, I don't want you to take my questions the wrong way)

@yuyichao
Copy link
Contributor

yuyichao commented Nov 4, 2016

Why not do the same for SIGTERM?

I don't think python does it either. And I think TERM usually means the program shouldn't really continue.

"Transforming" the signal into exception seems to work fine for ruby and python (I never had any problem with CTRL-C in those languages). What could delay the exception ?

Native code. The interpreter for python and ruby are also slow enough that periodically checking a flag does not slow it down by too much.

@n3f4s
Copy link
Author

n3f4s commented Nov 8, 2016

And I think TERM usually means the program shouldn't really continue

you can "trap" a SIGTERM and ignore it (you shouldn't but it's possible).

Stopping the execution without cleaning is a better way to handle signals than raising a segfault. Some languages don't clean after a INT or TERM. For example, in C++ programs, the destructor are not fired (by default) when encountering TERM/INT signals.

@lleiding
Copy link

Is there any news on this (see also #25308)?
Not only is the current signal handling quite annoying - in a scenario where you use a job management engine like SLURM, it leads to a number of zombie processes in case your jobs (julia scripts) get killed due to e.g. exceeded memory limits. This means having to ssh into the nodes and killing the processes manually. Of course this only works if you have ssh access. Otherwise you would completely clutter up your computing clusters over time.
In the meantime, is there any workaround one could use?

running julia 0.6.x btw.

@dpinol
Copy link

dpinol commented Oct 1, 2021

I was not able to reproduce https://docs.julialang.org/en/v1/base/c/#Base.exit_on_sigint with CTRL-C on sleep nor readline.
In all combinations of julia 1.6.3/1.7rc1 and linux/windows I never manage to the code at "catch" to be executed.

Base.exit_on_sigint(false)
try
  sleep(300)
  #readline()
catch e
  println("catched!!!!!",e)
finally
  println("finally")
end

Only with Julia 1.7, I get this stacktrace, but the "catch" clause is never executed.

Unhandled Task ERROR: InterruptException:
Stacktrace:
 [1] poptask(W::Base.InvasiveLinkedListSynchronized{Task})
   @ Base ./task.jl:814
 [2] wait()
   @ Base ./task.jl:823
 [3] wait(c::Condition)
   @ Base ./condition.jl:112
 [4] macro expansion
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Distributed/src/remotecall.jl:253 [inlined]
 [5] (::Distributed.var"#133#134")()
   @ Distributed ./task.jl:411

With "readline", the "finally" clause is only executed when I press return after the CTRL-C.
With "sleep", I need to press CTRL-C twice to quit the application. The "finally" clause is never executed either

thanks

@brenhinkeller brenhinkeller added the error handling Handling of exceptions by Julia or the user label Nov 21, 2022
@vtjnash
Copy link
Member

vtjnash commented Jan 31, 2024

Will be closed by #49541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error handling Handling of exceptions by Julia or the user
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants