Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation requests #13

Open
talex5 opened this issue Dec 28, 2022 · 1 comment
Open

Documentation requests #13

talex5 opened this issue Dec 28, 2022 · 1 comment

Comments

@talex5
Copy link
Contributor

talex5 commented Dec 28, 2022

I think I'm a bit confused about how to use dscheck. Perhaps other people will be too.

The README says:

At the core, dscheck runs each test numerous times, exhaustively trying every single interleaving of atomic operations.

I tried adding some prints to it:

module Atomic = Dscheck.TracedAtomic
(* tested implementation needs to use dscheck's atomic module *)

let test_counter () =
  print_endline "\nstart";
  let counter = Atomic.make 0 in
  let incr () =
    print_endline "start incr";
    Atomic.set counter (Atomic.get counter + 1);
    print_endline "end incr";
  in 
  Atomic.spawn incr;
  Atomic.spawn incr;
  Atomic.final (fun () ->
      print_endline "end";
      Atomic.check (fun () -> Atomic.get counter == 2)
    )

let () = Atomic.trace test_counter

Many of the runs don't seem to finish. Is this expected? e.g. I get:

start
start incr

start
start incr

start
start incr
end incr

...

It would be useful if it could run without logging while searching, and then run the failing test case again with logging on. The default trace output doesn't seem very useful (e.g. it's hard to know what Process 3: compare_and_swap 14 is referring to).

I had assumed I could use non-atomic variables to keep track of the expected state and then compare that against the atomic values. e.g. adding a value to a lock-free queue and also to a regular Stdlib.Queue and checking the final contents are the same. Is that unsafe?

Does Atomic.spawn always run the function immediately, up to the first atomic operation? For example, this doesn't find the error:

let test_counter () =
  print_endline "\nstart";
  let x = Atomic.make true in   (* This is supposed to be false *)
  let might_be_set = ref false in
  Atomic.spawn (fun () ->
      print_endline "Setting...";
      might_be_set := true;
      Atomic.set x true;
      print_endline "Set";
    );
  Atomic.spawn (fun () ->
      print_endline "Checking";
      if not !might_be_set then assert (Atomic.get x = false)
    );
  Atomic.final (fun () ->
      print_endline "end";
    )

But switching the order of the spawns does find it. I guess if another process depends on the value then I need to use an atomic.

I tried doing this to encourage it to switch:

let yield = Atomic.make () in
...
Atomic.spawn (fun () -> Atomic.get yield; ...)

That worked if I put it in both spawns, but not in just one of them.

Basically, it would be useful to know what things are safe to do and which prevent it from testing all cases.

(Also: the example in the README doesn't do anything unless you add let () = Atomic.trace test_counter. It would be good to include this - maybe tested with MDX?).

@sadiqj
Copy link
Collaborator

sadiqj commented Dec 28, 2022

Will come back with more on the other points in a few days.

First, we should change the README. dscheck doesn't exhaustively explore all interleavings, only all interleavings that have conflicting atomic operations which could cause a race. Consider two atomic variables a and b, the following two interleavings result in the same behaviour:

t0 = a (thread 1) /* read a */
t1 = b (thread 2) /* read b */
b = t0 (thread 1) /* write b */
a = t1 (thread 2) /* write a */

--

t1 = b (thread 2) /* read b */
t0 = a (thread 1) /* read a */
b = t0 (thread 1) /* write b */
a = t1 (thread 2) /* write a */

The implementation of DPOR in dscheck will cull the search space for one of those two orderings of the read of b and read of a. Since there are potentially many interleavings that could start with those two orderings, we get an exponential speedup and can model check things that wouldn't be possible otherwise.

Second, yes it is expected that every execution won't run to completion. To make the implementation a little less hairy and to cope without multishot continuations, we internally work with schedules.

These are the series of operations and the process (thread) to run next at each atomic operation. We have do_run: https://github.com/ocaml-multicore/dscheck/blob/main/src/tracedAtomic.ml#L168 where we run the function you pass trace against a schedule and get back the state of all processes at the final atomic operation in the schedule. We then construct a new schedule and re-run things. So for a trace a hundred atomic operations deep, we might run things a hundred times (though with only one execution actually making it the full way through).

Third, spawns aren't run immediately. Your test_counter function is run many times and the ordering of the spawns is changed (if it would lead to different behaviour).

Given the above in one and two you probably don't want to be using shared non-atomics between the Atomic spawns.

I have to run but will come back to this tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants