-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source sets #18
Source sets #18
Conversation
Very cool!! This is really subtle to review, I like the strategy but I don't fully understand the code, there's some weird bug lurking somewhere: module Atomic = Dscheck.TracedAtomic
let test () =
let a = Atomic.make 0 in
let b = Atomic.make 0 in
let c = Atomic.make 0 in
let ok = Atomic.make false in
Atomic.spawn (fun () ->
Atomic.set a 1 ;
Atomic.set b 1 ;
);
Atomic.spawn (fun () ->
Atomic.set c 1 ;
Atomic.set c 2 ;
Atomic.set c 3 ;
Atomic.set b 2 ;
);
Atomic.spawn (fun () ->
if Atomic.get c = 2 && Atomic.get b = 0
then Atomic.set ok true
);
Atomic.final (fun () ->
Format.printf "a=%i b=%i c=%i ok=%b@."
(Atomic.get a)
(Atomic.get b)
(Atomic.get c)
(Atomic.get ok)
)
let () = Atomic.trace test
(*
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=2 c=3 ok=false
a=1 b=1 c=3 ok=false
num_traces: 6
*) There should be a window for Do you think you could test it with #15? There are some existing bugs in dscheck so it's hard to tell if an optimization is correct or if it introduces new issues :/ |
Yeah, this is not looking correct. I managed to simplify the failing case to the following: let test () =
let c = Atomic.make 0 in
let b = Atomic.make 0 in
let ok = Atomic.make false in
Atomic.spawn (fun () -> Atomic.set b 1);
Atomic.spawn (fun () ->
Atomic.set c 1;
Atomic.set b 2);
Atomic.spawn (fun () ->
if Atomic.get c = 0 then if Atomic.get b = 0 then Atomic.set ok true);
Atomic.final (fun () ->
Format.printf "b=%i c=%i ok=%b@." (Atomic.get b) (Atomic.get c)
(Atomic.get ok)) In the 3/4 trace we observe P2's accessing |
e4abc64
to
337fc79
Compare
Ok, I have some idea about what's happening. The general principle behind DPOR algos is discovery of reversible races. Whenever we discover one, we need to ensure that a new trace containing the reversal gets explored at some point. Source-DPOR generates "initials" of execution histories leading to reversal and if none of the initials is in the relevant backtrack, one (arbitrarily chosen) initial gets added. That should lead to the new trace. However, it is not what I'm seeing in the case above. There, the fourth (last) trace generates two initials: 0, 2. But 0 is in the sleep set already and inserting it into backtrack does not have any impact. On the other hand, choosing 2 lets us explore all the traces (while still visiting only 7% of the execution histories that vanilla DPOR sees). Afaict we generate the initials and sleep sets correctly, thus I've added a bias to prefer to backtrack unblocked/unexplored transitions and that seems to help. I'll do more automated testing. |
This looks very promising. Does this algorithm explore all the "interesting" schedules that the original algorithm explores? |
As far as the tests we have, yes. I'm a little uneasy with the fact that we had to specialize on the algo from the paper to handle the case above correctly. I'm working on tooling to automatically search for programs (borrowing ideas from #14), where source-dpor explores less traces than the current implementation. (Incidentally, it confirms that the 12k execution sequences we explore for michael scott indeed fold into just 16 traces). Related: #19 |
src/tracedAtomic.ml
Outdated
IntSet.(min_elt_opt (diff initials prefix_top_sleep)) | ||
with | ||
| Some initial -> initial | ||
| None -> IntSet.min_elt initials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm so I don't understand how there can be multiple initials? When dscheck wants to commute the current operation with a previous one, it has to find the (unique) causal chain that allow the current operation to happen earlier (and then the unique initial is the first process of that chain).
I'm guessing the issues you are seeing (and the bias fix) is related to spawns and the chosen initial not being in s.enabled
in the past, as scheduling an operation earlier may require the initial to be a parent domain (which will then spawn the current domain and enable the current operation to happen before the backtracked one)
The causal chain is a bit more tricky to determine than just following parent spawns, as the current operation can conditionally depend on some unrelated domain updating an atomic at the right time.
(... the wip code is a bit hard to read, I may be mistaken, but the initials set looks weird :P )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm so I don't understand how there can be multiple initials?
There's many ways to reverse a race and we'd want to explore fewer alternatives where possible. Say, we are in a sequence E.t.w.t', where E, w are some sequences of transitions, and t, t' are single transitions, which constitute a reversible race (act on the same variable, belong to different proc). The strongest way to execute the reversal of t, t' is to add t' to E.backtrack. This guarantees that E.t' will be or has been explored.
But that's not the only way. Suppose w is a single operation that is independent with all of our E, t, t' business. A spawn or just an access of a different var on a different thread. With that, we can explore the reversal as E.w.t' as well. Therefore, there are two initials = {t', w}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha yes, you are right, I forgot that I was thinking of the shortest (strongest) sequence which is unique. So in your example, I don't think we need to consider the initial w
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to consider both initials to explore fewer interleavings. That's because the sequences E.w.t' and E.t'.w are equivalent and, generally, if we explore one initial, then we have explored both. I haven't penetrated the proof of it fully but it makes sense intuitively - throwing an indep operation in front of a racy program should not impact the result.
We're not seeing this above, without the bias patch, because our computation of indep and initials is too simplistic (essentially more or less forgetting about transitivity of happens-before). We end up with too many initials, and some of them belong to traces, rather than the one we want to schedule.
A curious observation: with intransitive happens-before the without-bias version is fine on even very complicated two-thread tests, but it's really easy to break it with 3+. That's the smallest offending test case I found:
let test () =
let a = Atomic.make 0 in
let b = Atomic.make 0 in
Atomic.spawn (fun () ->
Atomic.set b 1;
Atomic.set a 2);
Atomic.spawn (fun () -> Atomic.set a 4);
Atomic.spawn (fun () -> Atomic.set b 8);
I have a patch that fixes the computation of indep and initials. I'll clean it up and push next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so with two domains, one just needs to mark the backtrackable steps as there's only one alternative domain to choose from (so anything works ^^). With more domains, you can have a complex situation where backtracking one domain operation requires running (or not running) other domains first. The source set paper you linked has an example at the top of page 7, with 4 domains, to illustrate why wake-up trees are necessary for optimality.
In any case, sleeping sets are already pretty good, but:
- If the strongest initial is in the sleeping set, then one shouldn't attempt to backtrack there as it was already explored
- If the chosen initial is not in the enabled set at that time, then backtracking will silently ignore the branch
Pending your next fix, without the bias, I think you are in the second situation where the strongest initial is not correctly computed for some backtrackable steps (another domain should have been picked as initial, which would have enabled the branch). With the bias, I think you end up scheduling more branches since it doesn't actually check that the operation was backtrackable at that time (but as long as the enabled set is not considered, then you can still pick a random initial that won't be able to run).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so with two domains, one just needs to mark the backtrackable steps as there's only one alternative domain to choose from (so anything works ^^).
Yep, basically there's no transitive case really. Something to be mindful of in the future.
7220d3a
to
3985ea8
Compare
Backed out the previous attempt and fixed the root cause. We're now fully in line with the paper and should be good for proper review. |
| Some run_ptr -> dependent_vars := IntSet.add run_ptr !dependent_vars); | ||
|
||
if happen_after then None else Some state_cell) | ||
sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to distinguish between set and get operations and their effect on run_ptr
, otherwise you could end up filtering out steps that could have been backtracked (and hence skip a branch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not need to; as far as dpor or source dpor are concerned, every access is basically r/w. Are you referring to computation initials (or indep)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, that's a future problem!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🎉
3985ea8
to
eef30e8
Compare
@bartoszmodelski what needs to happen to merge this? |
As a test, it might be worth re-checking this on cells.ml from Eio.Condition (ocaml-multicore/eio#397) as the previous versions OOMed on it. |
This is based on two PRs that await review. One adds a dpor-validator, another test generator. Afaik @lyrm has it as a priority.
I ran tests with the eio suite. To terminate these tests in reasonable time we need #22 as well (it's a draft but a fully tested one, I planned to prep it for review once this is merged but will do so soon). |
I reviewed PR #19 and #21. I have meeting with @bartoszmodelski to talk about this PR and the future of For this I will do a short review of the code this afternoon but I won't have the time to do a deep review of the algorithm. However what PR #19 and #21 bring is here to give us some certainty about that. I will be mainly looking for missing documentation. |
Thanks @lyrm for the update. May I please encourage updates in the Github thread even if the conversations are happening f2f or through other channels. Such updates help the community to engage with the project better. Thanks for the work on this so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the algorithm is still quite shallow but explore_dpor
seems to follow properly the algorithm of the paper. Comments are definitely useful.
Here is a few general notes / questions :
- you should mention somewhere that your variables are named according to the paper (with a link to it ?)
- why is it useful to keep the previous algorithm ? (
explore
function and'Dpor
option for thetrace
function)
if happen_after then None else Some state_cell) | ||
sequence | ||
|
||
let rec explore_source func state sleep_sets = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is sleep_sets
a list ? I see where new sleep_set
are added to it but I can't see where anything else then the last element of the list is used. In algorithm 1 of the paper, it is not a list.
I just tried with just a single sleep_set instead of a list and it does not seem to break anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it does not have to be a list. Keeping just the one that's passed down works. Having said that:
- It's useful for introspection; if anything looks odd I'd usually begin with printing backtrack and sleep set at every level.
- Granular dependency relation branch ports a small optimization from nidhugg that uses this.
let rec explore_source func state sleep_sets = | ||
let sleep = ref (last_element sleep_sets) in | ||
let s = last_element state in | ||
let p_maybe = IntSet.min_elt_opt (IntSet.diff s.enabled !sleep) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it an arbitrary choice ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Technically could be a random choice. Same when selecting the initial.
src/tracedAtomic.ml
Outdated
match proc'.run_ptr with | ||
| None -> false | ||
| Some run_ptr -> obj_ptr = run_ptr && proc'.run_proc <> p) | ||
new_state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be state
instead of new_state
as you know state_top
is going to be filtered out.
src/tracedAtomic.ml
Outdated
sequences from E. | ||
*) | ||
let initials = | ||
let rec f = function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(loop
or aux
are better then f
as, at least, you can make a search on it)
eef30e8
to
fefa925
Compare
Good idea, added a note at the top.
I left it in for reference and to make the review easier (removing it here increases size of the patch and opens the doors for changing other parts of the code). Still at this point in development I found it useful to sometimes switch over to the old DPOR and confirm that we are indeed doing the correct thing. I would not oppose removing it in a bit though. |
Ping. The conversation seems to have gotten stuck on this PR. Can we get this to a merge soon? |
I think the conversion is over. I can merge this today. |
(Based on #19, #21)
PR replacing our current persistent sets with source sets as outlined in the Source Sets: A Foundation for Optimal Dynamic Partial Order Reuction. Source sets achieve a better reduction in general and optimal reduction on some tests.
Part of the improvement can be definitely attributed to the fact that spawns are not treated as independent operations and are not reordered with atomic operations (unless forced by required reorder of atomic operations).
Results
Test 1
(write A, write B), (write B, write A)
main
Source-set
Test 2
(write A, write B), (write A, write B)
main:
Source-set
Existing tests
Logic
Just like with the previous algorithm, we execute the program on a single core and take control at calls to Atomic in order to force certain interleavings to be explored. We do that by stopping after every operation, looking at the most recent races and scheduling a trace that reverses the two racing accesses. Races here are pretty much accesses to shared variables made by two different threads.
We depart from the previous algorithm in the way sequences are scheduled. Suppose we have the following interleaving E.t.w.t', where t and t' are a race and w is an independent operation (say spawn of another thread).
Above may sound quite abstract. There is a lot of comments in the code that explain step-by-step how we achieve both of these things. Having said that, I would recommend at least skimming the sections 2-5 in the paper, even if just to double check me.
Other implementations
The paper focuses on the Concuerror project, which I didn't find particularly helpful as it focuses on Erlang's CSP. However, the same research group produced https://github.com/nidhugg/nidhugg, which is really close to what we want.
Testing
All existing tests pass and explore the same traces as DPOR. I've also used program_gen a lot, both to compare with existing dpor for smaller programs and with random exploration for larger ones (that source dpor can terminate on, while existing dpor not quite).
Future work
There's a few possible next steps: