-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ocaml5-issue] Out_channel.flush
can cause Syserror
when used in parallel
#444
Comments
Just observed this again - now on FreeBSD 5.1 trying out the latest QCheck memory usage improvement
|
Out_channel.flush
can cause Syserror
when used in parallelOut_channel.flush
can cause Syserror
when used in parallel
I've created a standalone reproducer for this one: let test () =
let path = Filename.temp_file "stm-" "" in
let channel = Atomic.make (Out_channel.open_text path) in
(* First, a bit of one-domain channel activity *)
Out_channel.output_byte (Atomic.get channel) 1;
(try Out_channel.flush (Atomic.get channel) with (Sys_error _) -> assert false);
let wait = Atomic.make true in
(* Domain 1 closes-opens-outputs repeatedly *)
let d1 = Domain.spawn (fun () ->
while Atomic.get wait do Domain.cpu_relax() done;
for _ = 1 to 50 do
Out_channel.close (Atomic.get channel);
Atomic.set channel (Out_channel.open_text path);
Out_channel.output_byte (Atomic.get channel) 1;
done;
) in
(* Domain 2 calls flush and pos repeatedly *)
let d2 = Domain.spawn (fun () ->
Atomic.set wait false;
(*
"Output functions raise a Sys_error exception when they are applied to a closed output channel,
except Out_channel.close and Out_channel.flush , which do nothing when applied to an already closed channel."
*)
for _ = 1 to 50 do
(try Out_channel.flush (Atomic.get channel)
with (Sys_error msg) -> Printf.printf "Out_channel.flush raised Sys_error %S\n%!" msg; assert false);
ignore (Out_channel.pos (Atomic.get channel));
done;
) in
let () = Domain.join d1 in
let () = Domain.join d2 in
(* Please leave the torture chamber nice and clean as you found it *)
(try Out_channel.close (Atomic.get channel) with Sys_error _ -> ());
Sys.remove path
let _ =
for i = 1 to 5_000 do
if i mod 250 = 0 then Printf.printf "#%!";
test ()
done With it, locally on Linux I am able to trigger the assertion failure on 5.1.0+fp and 5.2.0+fp switches. |
The branch https://github.com/ocaml-multicore/multicoretests/tree/flushtest-repro-focus triggers a focused test of it on the CI. Across all runs, this reproducer only triggered in multicoretests-ci where |
In trying to understand this better, I ran
The same test however completes under the following two which seems to indicate a fp-related bug that may have since been fixed on trunk:
A standalone reproducer: let test () =
let path = Filename.temp_file "stm-" "" in
let channel = Out_channel.open_text path in
Out_channel.close_noerr channel;
(try Out_channel.output_char channel 'A'; assert false with (Sys_error _) -> ()); (* Error (Sys_error("Bad file descriptor")) *)
(try Out_channel.output_char channel 'B'; assert false with (Sys_error _) -> ()); (* Succeeds *)
Sys.remove path
let _ =
for i = 1 to 10_000 do
if i mod 250 = 0 then Printf.printf "#%!";
test ()
done |
To summarize
This means that we are only missing an explanation of the above 5.2.0+fp standalone reproducers to close this issue. |
I ran into this too when running the test suite with Jane Street's branch of the OCaml compiler. My explanation seems to apply equally to the most recent head of ocaml/ocaml, so I thought I'd share it here. Explanation: the STM model expects that
If my explanation is correct, then maybe |
(The issue is probably more widespread than suggested by my last message: I see that |
Thanks for sharing! 🙏 |
I've now created a reproducer and filed this upstream: ocaml/ocaml#13586 |
A Cygwin 5.2 run triggered when merging #443 to main found an unexpected counterexample to
STM Out_channel parallel
:https://github.com/ocaml-multicore/multicoretests/actions/runs/8361686850/job/22890359851
AFAICS, the failure can be explained by
Flush
in the "right leg" causingSys_error("Bad file descriptor")
.This goes against the specification, which says:
This is currently captured in the STM Out_channel test as:
The text was updated successfully, but these errors were encountered: