Fix weak memory emulation to avoid generating behaviors that are forbidden under C++ 20 #4057

RalfJung · 2024-11-24T14:43:20Z

After discussion with Viktor Vafeiadis and Michalis Kokologiannakis, we concluded that the only thing Miri still needs to do to be sound wrt C++20 is treat SC fences as RMWs to a global location -- i.e., have a global unique clock that each RMW releases to and acquires from. This is unnecessarily strong (it rules out some behaviors that would be legal), but it is currently unclear how to do better, and we still pass all our tests for generating weak behaviors so hopefully we'll still get good test coverage for other cases.

@cbeuw does this make sense to you?

Fixes #2301

cbeuw · 2024-12-05T00:10:58Z

src/concurrency/weak_memory.rs

+//!   that C++20 intended to copy (<https://plv.mpi-sws.org/scfix/paper.pdf>); a change was
+//!   introduced when translating the math to English. According to Viktor Vafeiadis, this
+//!   difference is harmless. So we stick to what the standard says, and allow fewer behaviors.)
+//! - SC fences are treated like RMWs to a global clock, to ensure they induce enough


More specifically, they are treated as AcqRel RMWs right?

Yes, good point, I should say that.

…ted at the top of the file

cbeuw · 2024-12-05T23:00:41Z

My main reservation here is that I believe this is effectively the same SC fence treatment as proposed by Promising Semantics. And previously (when I remembered all of these stuff better than I do now) I said #2301 (comment)

SC fences are better behaved. Adapting Promising Semantics' SC fences is possible and can actually simplify our code. The bit I'm not fully comfortable with is the interaction between SC fences and SC accesses, which are fairly intertwined.

There could be an example involving both SC fences and SC accesses that exhibits C++20-forbidden behaviour. I think it's worth asking Viktor and Michalis what they think. I'll try to come up with such an example as well.

Though in any case, I'm happy for this to be merged, but without promising that we're fully C++20-compatible for programs that have both SC fences and SC accesses

RalfJung · 2024-12-06T06:44:20Z

Thanks for taking a look!

There could be an example involving both SC fences and SC accesses that exhibits C++20-forbidden behaviour. I think it's worth asking Viktor and Michalis what they think. I'll try to come up with such an example as well.

When I asked them, they said this should be enough, but I didn't ask specifically about the interactions. If a thread does first an SC fence and then an SC access, they do interact via the local clock after all, so in my understanding that should be enough.

We're generally not promising to be bug-free so I think this should suffice.

I can ask Michalis specifically about this again next time I see him.

orilahav · 2024-12-06T07:14:30Z

Note that SC fence also serve as a release+acquire fence, so you might need a bit more.
See second bullet on page 10 here:
https://www.cs.tau.ac.il/~orilahav/papers/popl21_robustness.pdf

RalfJung · 2024-12-06T08:56:08Z

We do have this:

miri/src/concurrency/data_race.rs

Lines 870 to 893 in 4e7f53a

    
           // Apply data-race detection for the current fences 
        
           // this treats AcqRel and SeqCst as the same as an acquire 
        
           // and release fence applied in the same timestamp. 
        
           if atomic != AtomicFenceOrd::Release { 
        
               // Either Acquire | AcqRel | SeqCst 
        
               clocks.apply_acquire_fence(); 
        
           } 
        
           if atomic != AtomicFenceOrd::Acquire { 
        
               // Either Release | AcqRel | SeqCst 
        
               clocks.apply_release_fence(); 
        
           } 
        
           if atomic == AtomicFenceOrd::SeqCst { 
        
               // Behave like an RMW on the global fence location. This takes full care of 
        
               // all the SC fence requirements, including C++17 §32.4 [atomics.order] 
        
               // paragraph 6 (which would limit what future reads can see). It also rules 
        
               // out many legal behaviors, but we don't currently have a model that would 
        
               // be more precise. 
        
               let mut sc_fence_clock = data_race.last_sc_fence.borrow_mut(); 
        
               sc_fence_clock.join(&clocks.clock); 
        
               clocks.clock.join(&sc_fence_clock); 
        
               // Also establish some sort of order with the last SC write that happened, globally 
        
               // (but this is only respected by future reads). 
        
               clocks.write_seqcst.join(&data_race.last_sc_write_per_thread.borrow()); 
        
           }

But it doesn't quite have the right order, compared to the paper. I guess we have to move the release fence to after the SC logic.

Is it possible to construct an example that is broken due to this?

orilahav · 2024-12-06T16:59:48Z

perhaps this one:

Rk1 acq
Ry0 rlx
||
Wx1 rlx
SC-fence
Wk1 rlx
||
Wy1 rlx
SC-fence
Wz1 rlx
||
Rz1 acq
Rx0 rlx

with preceding F-rel the behavior is allowed
with following F-rel the behavior is disallowed

RalfJung · 2024-12-06T18:53:00Z

To my great surprise Miri does indeed produce that behavior. :) I didn't think we could be that weak. And #4076 fixes it. Thanks a lot!

While you are here @orilahav , could you quickly confirm whether there is anything particular about the interaction of SC fences and SC accesses that we need to be worried about? Like, are they in some sort of global order whose effect changed with C++20?

orilahav · 2024-12-07T09:30:16Z

I'm not sure what you mean exactly.

The treatment of SC accesses in C++20 is broken (unsound for x86)...
(at least the text in https://en.cppreference.com/w/cpp/atomic/memory_order)

Their axioms are similar to what we mentioned in the RC11 paper to be too strong
(https://plv.mpi-sws.org/scfix/paper.pdf, Remark 4).

Hans Boehm knows about this issue, and I belive it is planned to be fixed at some point.
(Maybe already discussed in a standard committee meeting?)

With RMWs for SC-fences you get a memory model incomparable to RC11.

Wx1 sc
Ry0 sc
||
Wy1 rlx
SC-fence
Rx0 rlx

disallowed in RC11
allowed with RMWs for SC-fences

But:

x:=1 na
SC-fence
y:=1 rlx
a:=y rlx
if a=2 then
  k:=1 rlx
||
y:=2 rlx
SC-fence
b:=k rlx
if b=1 then
  c:=x na

racy in RC11
not-racy with RMWs for SC-fences

Note that there is also the question of acyclic(poUrf).
RC11 has this condition, to conservatively solve the thin-air problem.
C++20 doesn't have it.
Some examples with SC-fences are related to this choice.

Wz1 rlx
SC-fence
Wx1 rlx
||
Rx1 rlx
Wy1 rlx
||
Ry1 rlx
SC-fence
Rz0 rlx

allowed in RC11
disallowed with RMWs for SC-fences (when you include acyclic(poUrf))

I hope my examples are correct. If you draw conclusions from them, we need to check more carefully :)

RalfJung · 2024-12-07T10:12:38Z

The treatment of SC accesses in C++20 is broken (unsound for x86)...

Oh gosh I didn't realize it was so bad.^^

I'm not sure what you mean exactly.

What I mean is this: we are looking for a model (supporting all fences and all accesses) with the following properties:

most importantly: all behaviors generated by the model are allowed by C++20 (but not necessarily the other way around)
the model can be implemented using vector clocks or similar, i.e. we don't have to track and maintain an undue amount of state
the model should generate as many (legal) behaviors as reasonably possible

Our starting point is this paper. However, C++20 has forbidden some behaviors that were previously allowed, so this violates the first property above. To fix that we have done two changes:

If an SC load reads from an atomic store of any ordering, then a later SC load cannot read
from an earlier store in the location's modification order. This is to prevent creating a
backwards S edge from the second load to the first, as a result of C++20's coherence-ordered
before rules. (This seems to rule out behaviors that were actually permitted by the RC11 model
that C++20 intended to copy (https://plv.mpi-sws.org/scfix/paper.pdf); a change was
introduced when translating the math to English. According to Viktor Vafeiadis, this
difference is harmless. So we stick to what the standard says, and allow fewer behaviors.)
SC fences are treated like AcqRel RMWs to a global clock, to ensure they induce enough
synchronization with the surrounding accesses. This rules out legal behavior, but it is really
hard to be more precise here.

The question is, is this enough? We changed SC accesses and SC fences, but maybe the interplay of the two changed in more subtle ways in C++20 and that could still lead to us generating behaviors that C++20 does not allow.

orilahav · 2024-12-07T16:50:24Z

I suspect the first example in my previous comment shows a problem in your approach, right?
It is disallowed by C++20, but I think your algorithm allows this behavior.

(More generally speaking, I don't really understand why one should care about C++20 when we know it is broken.)

RalfJung · 2024-12-07T17:03:33Z

I hope my examples are correct. If you draw conclusions from them, we need to check more carefully :)

I have a hard time understanding bare examples like that, so the risk that I may be able to draw my own conclusions from them is rather low. ;)

I suspect the first example in my previous comment shows a problem in your approach, right?
It is disallowed by C++20, but I think your algorithm allows this behavior.

That looks exactly like the kind of example @cbeuw was worried about, thanks!

I can this test 1000 times and never saw the bad behavior. So I think we are fine. I'll have to figure out why, though...

I don't really understand why one should care about C++20 when we know it is broken

Because it's the model Rust officially subscribes to... because developing our own model is hard and also we need to be able to share memory with C/C++ programs.

cbeuw · 2024-12-07T17:05:35Z

The first example is prevented in our implementation:

miri/src/concurrency/weak_memory.rs

Lines 346 to 352 in 0603597

    
           } else if store_elem.timestamp <= clocks.write_seqcst[store_elem.store_index] 
        
               && store_elem.is_seqcst 
        
           { 
        
               // The current non-SC load, which may be sequenced-after an SC fence, 
        
               // cannot read-before the last SC store executed before the fence. 
        
               // C++17 §32.4 [atomics.order] paragraph 4 
        
               false

RalfJung · 2024-12-07T17:09:53Z

So here's the explanation:

We have a global last_sc_write_per_thread clock. Usually this is only updated by SC writes: when a thread does an SC write, it sets its own index in that clock to its current time. (It doesn't update the value for any other thread in that clock. So, this is like joining last_sc_write_per_thread with a clock that is 0 everywhere except for the thread that did the SC write).
On an SC fence, we join the current global last_sc_write_per_thread the write_seqcst of the current thread.
On a read, when searching the store buffer for a read we may use, if we encounter an element that was written with SC ordering and that is older than the per-thread write_seqcst, we do not consider older entries. That is the code @cbeuw just posted.

RalfJung · 2024-12-07T17:15:09Z

IIUC, this check was already present in the original POPL paper? #2512 only changed the comments here, not the logic, as far as I can tell.

orilahav · 2024-12-07T17:40:51Z

I don't really understand why one should care about C++20 when we know it is broken

Because it's the model Rust officially subscribes to... because developing our own model is hard and also we need to be able to share memory with C/C++ programs.

This is a strange reason in my mind. C++20 compilation to x86 is currently wrong!
That's much more important than unsoundness in Miri, and there is no reason to copy wrong C++20 to Miri.
You can use RC11 instead of C++20.

On an SC fence, we join the current global last_sc_write_per_thread the write_seqcst of the current thread.

Then, you do more than the RMW encoding you mentioned, bcs an RMW (+ rel/acq fences) won't do that (right?).
In that case it's hard for me to judge correctness, bcs it is not clear to me what is the declarative model your algorithm follows.

orilahav · 2024-12-07T18:22:32Z

Here are more complicated examples to try (both disallowed in RC11 IIUC):

Wx1 sc
Ry0 sc
||
Wy1 rlx
Wz1 rel
||
Rz1 acq
SC-fence
Rx0 rlx

Wx1 sc
Ry0 sc
||
Wy1 rlx
F-sc
Rz0 rlx
||
Wz1 rlx
||
Rz1 rlx
F-sc
Rx0 rlx

RalfJung · 2024-12-07T20:56:42Z

This is a strange reason in my mind. C++20 compilation to x86 is currently wrong!

That's much more important than unsoundness in Miri, and there is no reason to copy wrong C++20 to Miri. You can use RC11 instead of C++20. Sorry, I should have clarified: this is not a Miri decision. Rust documents that we use the C++20 memory model. Miri should match that. Maybe we should track this x86 problem somewhere... Is there a document or blog post or so that describes the problem?

Then, you do more than the RMW encoding you mentioned, bcs an RMW (+ rel/acq fences) won't do that (right?).

In that case it's hard for me to judge correctness, bcs it is not clear to me what is the declarative model your algorithm follows. Yeah, we also do whatever the POPL17 paper mentioned above does. Sadly I do not know of a well-documented operational model we could just use, so we are patching that POPL17 model instead. And until the C++ model situation is resolved I'd rather not spook crate authors with alerts I cannot explain...

orilahav · 2024-12-08T06:06:34Z

Maybe we should track this x86 problem somewhere... Is there a document or blog post or so that describes the problem?

This is what I was able to track:
https://cplusplus.github.io/LWG/lwg-active.html#3941

RalfJung · 2024-12-12T07:11:57Z

Here are more complicated examples to try (both disallowed in RC11 IIUC):

Miri seems to not generate either of them.

I think I understand the first of these tests (it checks the combination of SC accesses/fences being ordered with a release-acquire pair), but I don't get the second one. The lonely relaxed store in thread 3 is particularly confusing. Could you explain a bit what kind of a situation this test is targeting?

RalfJung · 2024-12-12T07:40:07Z

Okay I think I understand why that execution is impossible, I added some comments in #4090 (the test name there is test_sc_multi_fence). Would be great if you could take a look. :)

RalfJung force-pushed the scfix branch 5 times, most recently from 093d86f to 55a7f3e Compare November 24, 2024 16:57

cbeuw reviewed Dec 5, 2024

View reviewed changes

RalfJung force-pushed the scfix branch from 55a7f3e to caa2e03 Compare December 5, 2024 06:47

RalfJung added 3 commits December 5, 2024 08:03

extend some comments regarding weak memory emulation

e8d334c

move GlobalState definition further up so the types are mor concentra…

9542bc4

…ted at the top of the file

make SC fences stronger, to be correct wrt C++20

4e7f53a

RalfJung force-pushed the scfix branch from caa2e03 to 4e7f53a Compare December 5, 2024 07:03

RalfJung added this pull request to the merge queue Dec 6, 2024

Merged via the queue into rust-lang:master with commit 8b26f5c Dec 6, 2024
7 checks passed

RalfJung deleted the scfix branch December 6, 2024 07:10

RalfJung mentioned this pull request Dec 6, 2024

fix SC fence logic #4076

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix weak memory emulation to avoid generating behaviors that are forbidden under C++ 20 #4057

Fix weak memory emulation to avoid generating behaviors that are forbidden under C++ 20 #4057

RalfJung commented Nov 24, 2024 •

edited

Loading

cbeuw Dec 5, 2024

RalfJung Dec 5, 2024

cbeuw commented Dec 5, 2024

RalfJung commented Dec 6, 2024

orilahav commented Dec 6, 2024

RalfJung commented Dec 6, 2024

orilahav commented Dec 6, 2024

RalfJung commented Dec 6, 2024 •

edited

Loading

orilahav commented Dec 7, 2024

RalfJung commented Dec 7, 2024

orilahav commented Dec 7, 2024

RalfJung commented Dec 7, 2024

cbeuw commented Dec 7, 2024

RalfJung commented Dec 7, 2024

RalfJung commented Dec 7, 2024

orilahav commented Dec 7, 2024

orilahav commented Dec 7, 2024

RalfJung commented Dec 7, 2024 via email

orilahav commented Dec 8, 2024

RalfJung commented Dec 12, 2024 •

edited

Loading

RalfJung commented Dec 12, 2024

Fix weak memory emulation to avoid generating behaviors that are forbidden under C++ 20 #4057

Fix weak memory emulation to avoid generating behaviors that are forbidden under C++ 20 #4057

Conversation

RalfJung commented Nov 24, 2024 • edited Loading

cbeuw Dec 5, 2024

Choose a reason for hiding this comment

RalfJung Dec 5, 2024

Choose a reason for hiding this comment

cbeuw commented Dec 5, 2024

RalfJung commented Dec 6, 2024

orilahav commented Dec 6, 2024

RalfJung commented Dec 6, 2024

orilahav commented Dec 6, 2024

RalfJung commented Dec 6, 2024 • edited Loading

orilahav commented Dec 7, 2024

RalfJung commented Dec 7, 2024

orilahav commented Dec 7, 2024

RalfJung commented Dec 7, 2024

cbeuw commented Dec 7, 2024

RalfJung commented Dec 7, 2024

RalfJung commented Dec 7, 2024

orilahav commented Dec 7, 2024

orilahav commented Dec 7, 2024

RalfJung commented Dec 7, 2024 via email

orilahav commented Dec 8, 2024

RalfJung commented Dec 12, 2024 • edited Loading

RalfJung commented Dec 12, 2024

RalfJung commented Nov 24, 2024 •

edited

Loading

RalfJung commented Dec 6, 2024 •

edited

Loading

RalfJung commented Dec 12, 2024 •

edited

Loading