Source sets #18

bartoszmodelski · 2023-03-22T17:22:09Z

(Based on #19, #21)

PR replacing our current persistent sets with source sets as outlined in the Source Sets: A Foundation for Optimal Dynamic Partial Order Reuction. Source sets achieve a better reduction in general and optimal reduction on some tests.

Part of the improvement can be definitely attributed to the fact that spawns are not treated as independent operations and are not reordered with atomic operations (unless forced by required reorder of atomic operations).

Results

Test 1

(write A, write B), (write B, write A)

  Atomic.spawn (fun () ->
      Atomic.set a 1;
      Atomic.set b 2);
  Atomic.spawn (fun () ->
      Atomic.set b 3;
      Atomic.set a 4);
  Atomic.final (fun () ->
      Printf.printf "a=%d, b=%d\n%!" (Atomic.get a) (Atomic.get b))

main

a=4, b=3
a=4, b=3
a=4, b=2
a=4, b=3
a=4, b=2
a=4, b=2
a=1, b=2
num_traces: 7

Source-set

a=4, b=3
a=4, b=2
a=1, b=2
num_traces: 3

Test 2

(write A, write B), (write A, write B)

main:

a=3, b=4                            
a=3, b=4
a=3, b=4
a=3, b=2
a=3, b=4
a=3, b=4
a=3, b=2
a=1, b=4
a=1, b=2
num_traces: 9

Source-set

➜  dscheck git:(source-set) ✗ dune build && ./_build/default/tests/test_simple.exe -v
a=3, b=4                            
a=3, b=2
a=1, b=4
a=1, b=2

num_traces: 4

Existing tests

Safe list goes down from 130k traces to 23k traces. In this test all accesses target single location, so "by def" there's little DPOR algorithm can do with dependency relation specified as currently. I suspect the gain stems from being more careful with independent operations (i.e. spawns).
The most significant result is for MSQueue. For n=4 we drop from 12k states to 16. There's a lot of accesses on disjoint locations, which is precisely the scenario DPOR was developed for.

Logic

Just like with the previous algorithm, we execute the program on a single core and take control at calls to Atomic in order to force certain interleavings to be explored. We do that by stopping after every operation, looking at the most recent races and scheduling a trace that reverses the two racing accesses. Races here are pretty much accesses to shared variables made by two different threads.

We depart from the previous algorithm in the way sequences are scheduled. Suppose we have the following interleaving E.t.w.t', where t and t' are a race and w is an independent operation (say spawn of another thread).

Previously, we'd try to schedule the process of t' directly after E, to explore E.t'. .., and if proc(t') is not enabled, schedule all processes at E. Now, we're far more precise in that we're looking at all possible sequences that lead to the reversal (e.g. E.t'. .., E.w.t'. .., etc.). In fact, if E.w has already been explored, we know we don't have to schedule anything.
The sleep set is far more powerful, letting us ignore reversals of independent operations before they are scheduled.

Above may sound quite abstract. There is a lot of comments in the code that explain step-by-step how we achieve both of these things. Having said that, I would recommend at least skimming the sections 2-5 in the paper, even if just to double check me.

Other implementations

The paper focuses on the Concuerror project, which I didn't find particularly helpful as it focuses on Erlang's CSP. However, the same research group produced https://github.com/nidhugg/nidhugg, which is really close to what we want.

Testing

All existing tests pass and explore the same traces as DPOR. I've also used program_gen a lot, both to compare with existing dpor for smaller programs and with random exploration for larger ones (that source dpor can terminate on, while existing dpor not quite).

Future work

There's a few possible next steps:

Improve the dependency relation. At the moment we assume that all accesses on the same variable are races, but this is really not the case: two reads commute, two failing CASes commute, two writes with the same value commute, etc.
The paper with source sets also introduces wake-up trees. They are supposed to bring us to the optimal number of sequences (i.e. one sequence per trace). Apparently, they are not quite optimal but it's still an improvement. I'm a bit skeptical though as they use exponential memory and that could have impact on lockfree's testsuite.
TruST, which claims to solve all our problems: be linear in time, space and easily parallelizable. It's a fairly recent paper but with a public implementation.

art-w · 2023-03-23T16:42:54Z

Very cool!! This is really subtle to review, I like the strategy but I don't fully understand the code, there's some weird bug lurking somewhere:

module Atomic = Dscheck.TracedAtomic

let test () =
  let a = Atomic.make 0 in
  let b = Atomic.make 0 in
  let c = Atomic.make 0 in
  let ok = Atomic.make false in
  Atomic.spawn (fun () ->
    Atomic.set a 1 ;
    Atomic.set b 1 ;
  );
  Atomic.spawn (fun () ->
    Atomic.set c 1 ;
    Atomic.set c 2 ;
    Atomic.set c 3 ;
    Atomic.set b 2 ;
  );
  Atomic.spawn (fun () ->
    if Atomic.get c = 2 && Atomic.get b = 0
    then Atomic.set ok true
  );
  Atomic.final (fun () ->
    Format.printf "a=%i b=%i c=%i ok=%b@."
      (Atomic.get a)
      (Atomic.get b)
      (Atomic.get c)
      (Atomic.get ok)
  )

let () = Atomic.trace test

(*
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=1 c=3 ok=false

num_traces: 6
*)

There should be a window for ok=true?.. (dscheck main says yes, so I think this is new)

Do you think you could test it with #15? There are some existing bugs in dscheck so it's hard to tell if an optimization is correct or if it introduces new issues :/

bartoszmodelski · 2023-04-04T16:36:30Z

Yeah, this is not looking correct. I managed to simplify the failing case to the following:

let test () =
  let c = Atomic.make 0 in
  let b = Atomic.make 0 in
  let ok = Atomic.make false in

  Atomic.spawn (fun () -> Atomic.set b 1);
  Atomic.spawn (fun () ->
      Atomic.set c 1;
      Atomic.set b 2);
  Atomic.spawn (fun () ->
      if Atomic.get c = 0 then if Atomic.get b = 0 then Atomic.set ok true);

  Atomic.final (fun () ->
      Format.printf "b=%i c=%i ok=%b@." (Atomic.get b) (Atomic.get c)
        (Atomic.get ok))

In the 3/4 trace we observe P2's accessing b but it does not bubble down to precede P0's.

bartoszmodelski · 2023-04-05T17:57:42Z

Ok, I have some idea about what's happening.

The general principle behind DPOR algos is discovery of reversible races. Whenever we discover one, we need to ensure that a new trace containing the reversal gets explored at some point. Source-DPOR generates "initials" of execution histories leading to reversal and if none of the initials is in the relevant backtrack, one (arbitrarily chosen) initial gets added. That should lead to the new trace. However, it is not what I'm seeing in the case above. There, the fourth (last) trace generates two initials: 0, 2. But 0 is in the sleep set already and inserting it into backtrack does not have any impact. On the other hand, choosing 2 lets us explore all the traces (while still visiting only 7% of the execution histories that vanilla DPOR sees). Afaict we generate the initials and sleep sets correctly, thus I've added a bias to prefer to backtrack unblocked/unexplored transitions and that seems to help.

I'll do more automated testing.

kayceesrk · 2023-04-07T09:16:05Z

This looks very promising. Does this algorithm explore all the "interesting" schedules that the original algorithm explores?

bartoszmodelski · 2023-04-07T09:56:53Z

As far as the tests we have, yes.

I'm a little uneasy with the fact that we had to specialize on the algo from the paper to handle the case above correctly. I'm working on tooling to automatically search for programs (borrowing ideas from #14), where source-dpor explores less traces than the current implementation.

(Incidentally, it confirms that the 12k execution sequences we explore for michael scott indeed fold into just 16 traces).

Related: #19

art-w · 2023-04-11T12:44:13Z

src/tracedAtomic.ml

+                    IntSet.(min_elt_opt (diff initials prefix_top_sleep))
+                  with
+                  | Some initial -> initial
+                  | None -> IntSet.min_elt initials


Hm so I don't understand how there can be multiple initials? When dscheck wants to commute the current operation with a previous one, it has to find the (unique) causal chain that allow the current operation to happen earlier (and then the unique initial is the first process of that chain).

I'm guessing the issues you are seeing (and the bias fix) is related to spawns and the chosen initial not being in s.enabled in the past, as scheduling an operation earlier may require the initial to be a parent domain (which will then spawn the current domain and enable the current operation to happen before the backtracked one)

The causal chain is a bit more tricky to determine than just following parent spawns, as the current operation can conditionally depend on some unrelated domain updating an atomic at the right time.

(... the wip code is a bit hard to read, I may be mistaken, but the initials set looks weird :P )

Hm so I don't understand how there can be multiple initials?

There's many ways to reverse a race and we'd want to explore fewer alternatives where possible. Say, we are in a sequence E.t.w.t', where E, w are some sequences of transitions, and t, t' are single transitions, which constitute a reversible race (act on the same variable, belong to different proc). The strongest way to execute the reversal of t, t' is to add t' to E.backtrack. This guarantees that E.t' will be or has been explored.

But that's not the only way. Suppose w is a single operation that is independent with all of our E, t, t' business. A spawn or just an access of a different var on a different thread. With that, we can explore the reversal as E.w.t' as well. Therefore, there are two initials = {t', w}.

Ha yes, you are right, I forgot that I was thinking of the shortest (strongest) sequence which is unique. So in your example, I don't think we need to consider the initial w?

We want to consider both initials to explore fewer interleavings. That's because the sequences E.w.t' and E.t'.w are equivalent and, generally, if we explore one initial, then we have explored both. I haven't penetrated the proof of it fully but it makes sense intuitively - throwing an indep operation in front of a racy program should not impact the result.

We're not seeing this above, without the bias patch, because our computation of indep and initials is too simplistic (essentially more or less forgetting about transitivity of happens-before). We end up with too many initials, and some of them belong to traces, rather than the one we want to schedule.

A curious observation: with intransitive happens-before the without-bias version is fine on even very complicated two-thread tests, but it's really easy to break it with 3+. That's the smallest offending test case I found:

let test () = let a = Atomic.make 0 in let b = Atomic.make 0 in Atomic.spawn (fun () -> Atomic.set b 1; Atomic.set a 2); Atomic.spawn (fun () -> Atomic.set a 4); Atomic.spawn (fun () -> Atomic.set b 8);

I have a patch that fixes the computation of indep and initials. I'll clean it up and push next week.

Yeah so with two domains, one just needs to mark the backtrackable steps as there's only one alternative domain to choose from (so anything works ^^). With more domains, you can have a complex situation where backtracking one domain operation requires running (or not running) other domains first. The source set paper you linked has an example at the top of page 7, with 4 domains, to illustrate why wake-up trees are necessary for optimality.

In any case, sleeping sets are already pretty good, but:

If the strongest initial is in the sleeping set, then one shouldn't attempt to backtrack there as it was already explored

If the chosen initial is not in the enabled set at that time, then backtracking will silently ignore the branch

Pending your next fix, without the bias, I think you are in the second situation where the strongest initial is not correctly computed for some backtrackable steps (another domain should have been picked as initial, which would have enabled the branch). With the bias, I think you end up scheduling more branches since it doesn't actually check that the operation was backtrackable at that time (but as long as the enabled set is not considered, then you can still pick a random initial that won't be able to run).

Yeah so with two domains, one just needs to mark the backtrackable steps as there's only one alternative domain to choose from (so anything works ^^).

Yep, basically there's no transitive case really. Something to be mindful of in the future.

bartoszmodelski · 2023-04-18T14:19:57Z

Backed out the previous attempt and fixed the root cause. We're now fully in line with the paper and should be good for proper review.

art-w · 2023-04-18T16:45:54Z

src/tracedAtomic.ml

+        | Some run_ptr -> dependent_vars := IntSet.add run_ptr !dependent_vars);
+
+      if happen_after then None else Some state_cell)
+    sequence


I think you need to distinguish between set and get operations and their effect on run_ptr, otherwise you could end up filtering out steps that could have been backtracked (and hence skip a branch)

We should not need to; as far as dpor or source dpor are concerned, every access is basically r/w. Are you referring to computation initials (or indep)?

Oh right, that's a future problem!

art-w

LGTM! 🎉

kayceesrk · 2023-05-17T09:32:40Z

@bartoszmodelski what needs to happen to merge this?

avsm · 2023-05-17T11:38:32Z

As a test, it might be worth re-checking this on cells.ml from Eio.Condition (ocaml-multicore/eio#397) as the previous versions OOMed on it.

bartoszmodelski · 2023-05-17T12:11:46Z

@bartoszmodelski what needs to happen to merge this?

This is based on two PRs that await review. One adds a dpor-validator, another test generator. Afaik @lyrm has it as a priority.

As a test, it might be worth re-checking this on cells.ml from Eio.Condition (ocaml-multicore/eio#397) as the previous versions OOMed on it.

I ran tests with the eio suite. To terminate these tests in reasonable time we need #22 as well (it's a draft but a fully tested one, I planned to prep it for review once this is merged but will do so soon).

kayceesrk · 2023-05-18T04:33:09Z

This is based on two PRs that await review. One adds a dpor-validator, another test generator. Afaik @lyrm has it as a priority.

Thanks. What are these two PRs? This PRs message says

(Based on #18, #19)

But this PR is #18.

bartoszmodelski · 2023-05-18T08:30:32Z

Oops, it's #19, #21. Amended the readme.

kayceesrk · 2023-05-30T08:47:19Z

Afaik @lyrm has it as a priority.

Hi @lyrm, wonder if you've had a chance to review this?

lyrm · 2023-05-30T09:16:28Z

I reviewed PR #19 and #21. I have meeting with @bartoszmodelski to talk about this PR and the future of dscheck today and tomorrow.

For this PR, I already had an informal look at it. I tried it with lockfree tests (including some that are not merged). It is working as expected : same results than before if bugs are introduced, way shorter than before if the tested code is working. The hash table tests from example were not finished after running for 36 hours and are now finishing in a few minutes.

I will do a short review of the code this afternoon but I won't have the time to do a deep review of the algorithm. However what PR #19 and #21 bring is here to give us some certainty about that. I will be mainly looking for missing documentation.

kayceesrk · 2023-05-30T09:20:23Z

Thanks @lyrm for the update.

May I please encourage updates in the Github thread even if the conversations are happening f2f or through other channels. Such updates help the community to engage with the project better. Thanks for the work on this so far.

lyrm

My understanding of the algorithm is still quite shallow but explore_dpor seems to follow properly the algorithm of the paper. Comments are definitely useful.

Here is a few general notes / questions :

you should mention somewhere that your variables are named according to the paper (with a link to it ?)
why is it useful to keep the previous algorithm ? (explore function and 'Dpor option for the trace function)

lyrm · 2023-05-31T22:25:50Z

src/tracedAtomic.ml

+      if happen_after then None else Some state_cell)
+    sequence
+
+let rec explore_source func state sleep_sets =


Why is sleep_sets a list ? I see where new sleep_set are added to it but I can't see where anything else then the last element of the list is used. In algorithm 1 of the paper, it is not a list.

I just tried with just a single sleep_set instead of a list and it does not seem to break anything.

Yeah, it does not have to be a list. Keeping just the one that's passed down works. Having said that:

It's useful for introspection; if anything looks odd I'd usually begin with printing backtrack and sleep set at every level.

Granular dependency relation branch ports a small optimization from nidhugg that uses this.

lyrm · 2023-05-31T22:29:49Z

src/tracedAtomic.ml

+let rec explore_source func state sleep_sets =
+  let sleep = ref (last_element sleep_sets) in
+  let s = last_element state in
+  let p_maybe = IntSet.min_elt_opt (IntSet.diff s.enabled !sleep) in


Is it an arbitrary choice ?

Yes. Technically could be a random choice. Same when selecting the initial.

lyrm · 2023-05-31T22:48:08Z

src/tracedAtomic.ml

+                    match proc'.run_ptr with
+                    | None -> false
+                    | Some run_ptr -> obj_ptr = run_ptr && proc'.run_proc <> p)
+                  new_state


This could be state instead of new_state as you know state_top is going to be filtered out.

lyrm · 2023-05-31T22:55:03Z

src/tracedAtomic.ml

+               sequences from E.
+            *)
+            let initials =
+              let rec f = function


(loop or aux are better then f as, at least, you can make a search on it)

bartoszmodelski · 2023-06-01T13:32:48Z

you should mention somewhere that your variables are named according to the paper (with a link to it ?)

Good idea, added a note at the top.

why is it useful to keep the previous algorithm ? (explore function and 'Dpor option for the trace function)

I left it in for reference and to make the review easier (removing it here increases size of the patch and opens the doors for changing other parts of the code). Still at this point in development I found it useful to sometimes switch over to the old DPOR and confirm that we are indeed doing the correct thing. I would not oppose removing it in a bit though.

kayceesrk · 2023-06-27T06:46:25Z

Ping. The conversation seems to have gotten stuck on this PR. Can we get this to a merge soon?

lyrm · 2023-06-27T09:11:46Z

I think the conversion is over. I can merge this today.

bartoszmodelski force-pushed the source-set branch 2 times, most recently from e4abc64 to 337fc79 Compare April 5, 2023 17:17

art-w reviewed Apr 11, 2023

View reviewed changes

bartoszmodelski force-pushed the source-set branch from 7220d3a to 3985ea8 Compare April 18, 2023 13:15

bartoszmodelski changed the base branch from main to program-gen April 18, 2023 13:16

bartoszmodelski changed the base branch from program-gen to main April 18, 2023 13:16

bartoszmodelski marked this pull request as ready for review April 18, 2023 13:18

bartoszmodelski requested a review from a team April 18, 2023 14:09

art-w reviewed Apr 18, 2023

View reviewed changes

art-w approved these changes Apr 19, 2023

View reviewed changes

lyrm mentioned this pull request Apr 20, 2023

Lockfree Skip List ocaml-multicore/saturn#65

Closed

7 tasks

bartoszmodelski force-pushed the source-set branch from 3985ea8 to eef30e8 Compare April 24, 2023 14:21

lyrm approved these changes May 31, 2023

View reviewed changes

Source sets

fefa925

bartoszmodelski force-pushed the source-set branch from eef30e8 to fefa925 Compare June 1, 2023 11:51

address feedback

9e18101

bartoszmodelski merged commit 40073b7 into main Jul 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source sets #18

Source sets #18

bartoszmodelski commented Mar 22, 2023 •

edited

Loading

art-w commented Mar 23, 2023

bartoszmodelski commented Apr 4, 2023

bartoszmodelski commented Apr 5, 2023

kayceesrk commented Apr 7, 2023

bartoszmodelski commented Apr 7, 2023 •

edited

Loading

art-w Apr 11, 2023

bartoszmodelski Apr 13, 2023 •

edited

Loading

art-w Apr 13, 2023

bartoszmodelski Apr 15, 2023

art-w Apr 18, 2023

bartoszmodelski Apr 18, 2023

bartoszmodelski commented Apr 18, 2023 •

edited

Loading

art-w Apr 18, 2023

bartoszmodelski Apr 18, 2023

art-w Apr 19, 2023

art-w left a comment

kayceesrk commented May 17, 2023

avsm commented May 17, 2023

bartoszmodelski commented May 17, 2023

kayceesrk commented May 18, 2023 •

edited

Loading

bartoszmodelski commented May 18, 2023

kayceesrk commented May 30, 2023

lyrm commented May 30, 2023 •

edited

Loading

kayceesrk commented May 30, 2023

lyrm left a comment

lyrm May 31, 2023

bartoszmodelski Jun 1, 2023

lyrm May 31, 2023

bartoszmodelski Jun 1, 2023

lyrm May 31, 2023

lyrm May 31, 2023

bartoszmodelski commented Jun 1, 2023

kayceesrk commented Jun 27, 2023

lyrm commented Jun 27, 2023

Source sets #18

Source sets #18

Conversation

bartoszmodelski commented Mar 22, 2023 • edited Loading

Results

Test 1

Test 2

Existing tests

Logic

Other implementations

Testing

Future work

art-w commented Mar 23, 2023

bartoszmodelski commented Apr 4, 2023

bartoszmodelski commented Apr 5, 2023

kayceesrk commented Apr 7, 2023

bartoszmodelski commented Apr 7, 2023 • edited Loading

Choose a reason for hiding this comment

bartoszmodelski Apr 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bartoszmodelski commented Apr 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

art-w left a comment

Choose a reason for hiding this comment

kayceesrk commented May 17, 2023

avsm commented May 17, 2023

bartoszmodelski commented May 17, 2023

kayceesrk commented May 18, 2023 • edited Loading

bartoszmodelski commented May 18, 2023

kayceesrk commented May 30, 2023

lyrm commented May 30, 2023 • edited Loading

kayceesrk commented May 30, 2023

lyrm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bartoszmodelski commented Jun 1, 2023

kayceesrk commented Jun 27, 2023

lyrm commented Jun 27, 2023

bartoszmodelski commented Mar 22, 2023 •

edited

Loading

bartoszmodelski commented Apr 7, 2023 •

edited

Loading

bartoszmodelski Apr 13, 2023 •

edited

Loading

bartoszmodelski commented Apr 18, 2023 •

edited

Loading

kayceesrk commented May 18, 2023 •

edited

Loading

lyrm commented May 30, 2023 •

edited

Loading