Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix weak memory emulation to avoid generating behaviors that are forbidden under C++ 20 #4057

Merged
merged 3 commits into from
Dec 6, 2024

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Nov 24, 2024

After discussion with Viktor Vafeiadis and Michalis Kokologiannakis, we concluded that the only thing Miri still needs to do to be sound wrt C++20 is treat SC fences as RMWs to a global location -- i.e., have a global unique clock that each RMW releases to and acquires from. This is unnecessarily strong (it rules out some behaviors that would be legal), but it is currently unclear how to do better, and we still pass all our tests for generating weak behaviors so hopefully we'll still get good test coverage for other cases.

@cbeuw does this make sense to you?

Fixes #2301

@RalfJung RalfJung force-pushed the scfix branch 5 times, most recently from 093d86f to 55a7f3e Compare November 24, 2024 16:57
//! that C++20 intended to copy (<https://plv.mpi-sws.org/scfix/paper.pdf>); a change was
//! introduced when translating the math to English. According to Viktor Vafeiadis, this
//! difference is harmless. So we stick to what the standard says, and allow fewer behaviors.)
//! - SC fences are treated like RMWs to a global clock, to ensure they induce enough
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, they are treated as AcqRel RMWs right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point, I should say that.

@cbeuw
Copy link
Contributor

cbeuw commented Dec 5, 2024

My main reservation here is that I believe this is effectively the same SC fence treatment as proposed by Promising Semantics. And previously (when I remembered all of these stuff better than I do now) I said #2301 (comment)

SC fences are better behaved. Adapting Promising Semantics' SC fences is possible and can actually simplify our code. The bit I'm not fully comfortable with is the interaction between SC fences and SC accesses, which are fairly intertwined.

There could be an example involving both SC fences and SC accesses that exhibits C++20-forbidden behaviour. I think it's worth asking Viktor and Michalis what they think. I'll try to come up with such an example as well.

Though in any case, I'm happy for this to be merged, but without promising that we're fully C++20-compatible for programs that have both SC fences and SC accesses

@RalfJung
Copy link
Member Author

RalfJung commented Dec 6, 2024

Thanks for taking a look!

There could be an example involving both SC fences and SC accesses that exhibits C++20-forbidden behaviour. I think it's worth asking Viktor and Michalis what they think. I'll try to come up with such an example as well.

When I asked them, they said this should be enough, but I didn't ask specifically about the interactions. If a thread does first an SC fence and then an SC access, they do interact via the local clock after all, so in my understanding that should be enough.

We're generally not promising to be bug-free so I think this should suffice.

I can ask Michalis specifically about this again next time I see him.

@RalfJung RalfJung added this pull request to the merge queue Dec 6, 2024
Merged via the queue into rust-lang:master with commit 8b26f5c Dec 6, 2024
7 checks passed
@RalfJung RalfJung deleted the scfix branch December 6, 2024 07:10
@orilahav
Copy link

orilahav commented Dec 6, 2024

Note that SC fence also serve as a release+acquire fence, so you might need a bit more.
See second bullet on page 10 here:
https://www.cs.tau.ac.il/~orilahav/papers/popl21_robustness.pdf

@RalfJung
Copy link
Member Author

RalfJung commented Dec 6, 2024

We do have this:

// Apply data-race detection for the current fences
// this treats AcqRel and SeqCst as the same as an acquire
// and release fence applied in the same timestamp.
if atomic != AtomicFenceOrd::Release {
// Either Acquire | AcqRel | SeqCst
clocks.apply_acquire_fence();
}
if atomic != AtomicFenceOrd::Acquire {
// Either Release | AcqRel | SeqCst
clocks.apply_release_fence();
}
if atomic == AtomicFenceOrd::SeqCst {
// Behave like an RMW on the global fence location. This takes full care of
// all the SC fence requirements, including C++17 §32.4 [atomics.order]
// paragraph 6 (which would limit what future reads can see). It also rules
// out many legal behaviors, but we don't currently have a model that would
// be more precise.
let mut sc_fence_clock = data_race.last_sc_fence.borrow_mut();
sc_fence_clock.join(&clocks.clock);
clocks.clock.join(&sc_fence_clock);
// Also establish some sort of order with the last SC write that happened, globally
// (but this is only respected by future reads).
clocks.write_seqcst.join(&data_race.last_sc_write_per_thread.borrow());
}

But it doesn't quite have the right order, compared to the paper. I guess we have to move the release fence to after the SC logic.

Is it possible to construct an example that is broken due to this?

@orilahav
Copy link

orilahav commented Dec 6, 2024

perhaps this one:

Rk1 acq
Ry0 rlx
||
Wx1 rlx
SC-fence
Wk1 rlx
||
Wy1 rlx
SC-fence
Wz1 rlx
||
Rz1 acq
Rx0 rlx

with preceding F-rel the behavior is allowed
with following F-rel the behavior is disallowed

@RalfJung RalfJung mentioned this pull request Dec 6, 2024
@RalfJung
Copy link
Member Author

RalfJung commented Dec 6, 2024

To my great surprise Miri does indeed produce that behavior. :) I didn't think we could be that weak. And #4076 fixes it. Thanks a lot!

While you are here @orilahav , could you quickly confirm whether there is anything particular about the interaction of SC fences and SC accesses that we need to be worried about? Like, are they in some sort of global order whose effect changed with C++20?

@orilahav
Copy link

orilahav commented Dec 7, 2024

I'm not sure what you mean exactly.

  1. The treatment of SC accesses in C++20 is broken (unsound for x86)...
    (at least the text in https://en.cppreference.com/w/cpp/atomic/memory_order)

Their axioms are similar to what we mentioned in the RC11 paper to be too strong
(https://plv.mpi-sws.org/scfix/paper.pdf, Remark 4).

Hans Boehm knows about this issue, and I belive it is planned to be fixed at some point.
(Maybe already discussed in a standard committee meeting?)

  1. With RMWs for SC-fences you get a memory model incomparable to RC11.
Wx1 sc
Ry0 sc
||
Wy1 rlx
SC-fence
Rx0 rlx

disallowed in RC11
allowed with RMWs for SC-fences

But:

x:=1 na
SC-fence
y:=1 rlx
a:=y rlx
if a=2 then
  k:=1 rlx
||
y:=2 rlx
SC-fence
b:=k rlx
if b=1 then
  c:=x na

racy in RC11
not-racy with RMWs for SC-fences

  1. Note that there is also the question of acyclic(poUrf).
    RC11 has this condition, to conservatively solve the thin-air problem.
    C++20 doesn't have it.
    Some examples with SC-fences are related to this choice.
Wz1 rlx
SC-fence
Wx1 rlx
||
Rx1 rlx
Wy1 rlx
||
Ry1 rlx
SC-fence
Rz0 rlx

allowed in RC11
disallowed with RMWs for SC-fences (when you include acyclic(poUrf))


I hope my examples are correct. If you draw conclusions from them, we need to check more carefully :)

@RalfJung
Copy link
Member Author

RalfJung commented Dec 7, 2024

The treatment of SC accesses in C++20 is broken (unsound for x86)...

Oh gosh I didn't realize it was so bad.^^

I'm not sure what you mean exactly.

What I mean is this: we are looking for a model (supporting all fences and all accesses) with the following properties:

  1. most importantly: all behaviors generated by the model are allowed by C++20 (but not necessarily the other way around)
  2. the model can be implemented using vector clocks or similar, i.e. we don't have to track and maintain an undue amount of state
  3. the model should generate as many (legal) behaviors as reasonably possible

Our starting point is this paper. However, C++20 has forbidden some behaviors that were previously allowed, so this violates the first property above. To fix that we have done two changes:

  • If an SC load reads from an atomic store of any ordering, then a later SC load cannot read
    from an earlier store in the location's modification order. This is to prevent creating a
    backwards S edge from the second load to the first, as a result of C++20's coherence-ordered
    before rules. (This seems to rule out behaviors that were actually permitted by the RC11 model
    that C++20 intended to copy (https://plv.mpi-sws.org/scfix/paper.pdf); a change was
    introduced when translating the math to English. According to Viktor Vafeiadis, this
    difference is harmless. So we stick to what the standard says, and allow fewer behaviors.)
  • SC fences are treated like AcqRel RMWs to a global clock, to ensure they induce enough
    synchronization with the surrounding accesses. This rules out legal behavior, but it is really
    hard to be more precise here.

The question is, is this enough? We changed SC accesses and SC fences, but maybe the interplay of the two changed in more subtle ways in C++20 and that could still lead to us generating behaviors that C++20 does not allow.

@orilahav
Copy link

orilahav commented Dec 7, 2024

I suspect the first example in my previous comment shows a problem in your approach, right?
It is disallowed by C++20, but I think your algorithm allows this behavior.

(More generally speaking, I don't really understand why one should care about C++20 when we know it is broken.)

@RalfJung
Copy link
Member Author

RalfJung commented Dec 7, 2024

I hope my examples are correct. If you draw conclusions from them, we need to check more carefully :)

I have a hard time understanding bare examples like that, so the risk that I may be able to draw my own conclusions from them is rather low. ;)

I suspect the first example in my previous comment shows a problem in your approach, right?
It is disallowed by C++20, but I think your algorithm allows this behavior.

That looks exactly like the kind of example @cbeuw was worried about, thanks!

I can this test 1000 times and never saw the bad behavior. So I think we are fine. I'll have to figure out why, though...

I don't really understand why one should care about C++20 when we know it is broken

Because it's the model Rust officially subscribes to... because developing our own model is hard and also we need to be able to share memory with C/C++ programs.

@cbeuw
Copy link
Contributor

cbeuw commented Dec 7, 2024

The first example is prevented in our implementation:

} else if store_elem.timestamp <= clocks.write_seqcst[store_elem.store_index]
&& store_elem.is_seqcst
{
// The current non-SC load, which may be sequenced-after an SC fence,
// cannot read-before the last SC store executed before the fence.
// C++17 §32.4 [atomics.order] paragraph 4
false

@RalfJung
Copy link
Member Author

RalfJung commented Dec 7, 2024

So here's the explanation:

  • We have a global last_sc_write_per_thread clock. Usually this is only updated by SC writes: when a thread does an SC write, it sets its own index in that clock to its current time. (It doesn't update the value for any other thread in that clock. So, this is like joining last_sc_write_per_thread with a clock that is 0 everywhere except for the thread that did the SC write).
  • On an SC fence, we join the current global last_sc_write_per_thread the write_seqcst of the current thread.
  • On a read, when searching the store buffer for a read we may use, if we encounter an element that was written with SC ordering and that is older than the per-thread write_seqcst, we do not consider older entries. That is the code @cbeuw just posted.

@RalfJung
Copy link
Member Author

RalfJung commented Dec 7, 2024

IIUC, this check was already present in the original POPL paper? #2512 only changed the comments here, not the logic, as far as I can tell.

@orilahav
Copy link

orilahav commented Dec 7, 2024

I don't really understand why one should care about C++20 when we know it is broken

Because it's the model Rust officially subscribes to... because developing our own model is hard and also we need to be able to share memory with C/C++ programs.

This is a strange reason in my mind. C++20 compilation to x86 is currently wrong!
That's much more important than unsoundness in Miri, and there is no reason to copy wrong C++20 to Miri.
You can use RC11 instead of C++20.

  • On an SC fence, we join the current global last_sc_write_per_thread the write_seqcst of the current thread.

Then, you do more than the RMW encoding you mentioned, bcs an RMW (+ rel/acq fences) won't do that (right?).
In that case it's hard for me to judge correctness, bcs it is not clear to me what is the declarative model your algorithm follows.

@orilahav
Copy link

orilahav commented Dec 7, 2024

Here are more complicated examples to try (both disallowed in RC11 IIUC):

Wx1 sc
Ry0 sc
||
Wy1 rlx
Wz1 rel
||
Rz1 acq
SC-fence
Rx0 rlx
Wx1 sc
Ry0 sc
||
Wy1 rlx
F-sc
Rz0 rlx
||
Wz1 rlx
||
Rz1 rlx
F-sc
Rx0 rlx

@RalfJung
Copy link
Member Author

RalfJung commented Dec 7, 2024 via email

@orilahav
Copy link

orilahav commented Dec 8, 2024

Maybe we should track this x86 problem somewhere... Is there a document or blog post or so that describes the problem?

This is what I was able to track:
https://cplusplus.github.io/LWG/lwg-active.html#3941

@RalfJung
Copy link
Member Author

RalfJung commented Dec 12, 2024

Here are more complicated examples to try (both disallowed in RC11 IIUC):

Miri seems to not generate either of them.

I think I understand the first of these tests (it checks the combination of SC accesses/fences being ordered with a release-acquire pair), but I don't get the second one. The lonely relaxed store in thread 3 is particularly confusing. Could you explain a bit what kind of a situation this test is targeting?

@RalfJung
Copy link
Member Author

Okay I think I understand why that execution is impossible, I added some comments in #4090 (the test name there is test_sc_multi_fence). Would be great if you could take a look. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weak memory emulation does not respect C++20 SCfix
3 participants