Documentation requests #13

talex5 · 2022-12-28T16:54:15Z

I think I'm a bit confused about how to use dscheck. Perhaps other people will be too.

The README says:

At the core, dscheck runs each test numerous times, exhaustively trying every single interleaving of atomic operations.

I tried adding some prints to it:

module Atomic = Dscheck.TracedAtomic
(* tested implementation needs to use dscheck's atomic module *)

let test_counter () =
  print_endline "\nstart";
  let counter = Atomic.make 0 in
  let incr () =
    print_endline "start incr";
    Atomic.set counter (Atomic.get counter + 1);
    print_endline "end incr";
  in 
  Atomic.spawn incr;
  Atomic.spawn incr;
  Atomic.final (fun () ->
      print_endline "end";
      Atomic.check (fun () -> Atomic.get counter == 2)
    )

let () = Atomic.trace test_counter

Many of the runs don't seem to finish. Is this expected? e.g. I get:

start
start incr

start
start incr

start
start incr
end incr

...

It would be useful if it could run without logging while searching, and then run the failing test case again with logging on. The default trace output doesn't seem very useful (e.g. it's hard to know what Process 3: compare_and_swap 14 is referring to).

I had assumed I could use non-atomic variables to keep track of the expected state and then compare that against the atomic values. e.g. adding a value to a lock-free queue and also to a regular Stdlib.Queue and checking the final contents are the same. Is that unsafe?

Does Atomic.spawn always run the function immediately, up to the first atomic operation? For example, this doesn't find the error:

let test_counter () =
  print_endline "\nstart";
  let x = Atomic.make true in   (* This is supposed to be false *)
  let might_be_set = ref false in
  Atomic.spawn (fun () ->
      print_endline "Setting...";
      might_be_set := true;
      Atomic.set x true;
      print_endline "Set";
    );
  Atomic.spawn (fun () ->
      print_endline "Checking";
      if not !might_be_set then assert (Atomic.get x = false)
    );
  Atomic.final (fun () ->
      print_endline "end";
    )

But switching the order of the spawns does find it. I guess if another process depends on the value then I need to use an atomic.

I tried doing this to encourage it to switch:

let yield = Atomic.make () in
...
Atomic.spawn (fun () -> Atomic.get yield; ...)

That worked if I put it in both spawns, but not in just one of them.

Basically, it would be useful to know what things are safe to do and which prevent it from testing all cases.

(Also: the example in the README doesn't do anything unless you add let () = Atomic.trace test_counter. It would be good to include this - maybe tested with MDX?).

The text was updated successfully, but these errors were encountered:

sadiqj · 2022-12-28T20:26:10Z

Will come back with more on the other points in a few days.

First, we should change the README. dscheck doesn't exhaustively explore all interleavings, only all interleavings that have conflicting atomic operations which could cause a race. Consider two atomic variables a and b, the following two interleavings result in the same behaviour:

t0 = a (thread 1) /* read a */
t1 = b (thread 2) /* read b */
b = t0 (thread 1) /* write b */
a = t1 (thread 2) /* write a */

--

t1 = b (thread 2) /* read b */
t0 = a (thread 1) /* read a */
b = t0 (thread 1) /* write b */
a = t1 (thread 2) /* write a */

The implementation of DPOR in dscheck will cull the search space for one of those two orderings of the read of b and read of a. Since there are potentially many interleavings that could start with those two orderings, we get an exponential speedup and can model check things that wouldn't be possible otherwise.

Second, yes it is expected that every execution won't run to completion. To make the implementation a little less hairy and to cope without multishot continuations, we internally work with schedules.

These are the series of operations and the process (thread) to run next at each atomic operation. We have do_run: https://github.com/ocaml-multicore/dscheck/blob/main/src/tracedAtomic.ml#L168 where we run the function you pass trace against a schedule and get back the state of all processes at the final atomic operation in the schedule. We then construct a new schedule and re-run things. So for a trace a hundred atomic operations deep, we might run things a hundred times (though with only one execution actually making it the full way through).

Third, spawns aren't run immediately. Your test_counter function is run many times and the ordering of the spawns is changed (if it would lead to different behaviour).

Given the above in one and two you probably don't want to be using shared non-atomics between the Atomic spawns.

I have to run but will come back to this tomorrow.

talex5 mentioned this issue Dec 28, 2022

Backtracking optimizations #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation requests #13

Documentation requests #13

talex5 commented Dec 28, 2022 •

edited

Loading

sadiqj commented Dec 28, 2022

Documentation requests #13

Documentation requests #13

Comments

talex5 commented Dec 28, 2022 • edited Loading

sadiqj commented Dec 28, 2022

talex5 commented Dec 28, 2022 •

edited

Loading