Can a C++ Compiler Optimize Away Relaxed Atomic Loads?
Introduction
In multi-threaded C++ programming, the std::atomic library provides the tools necessary to avoid data races and coordinate threads. However, developers often wonder how far the compiler's optimizer can go under the "as-if" rule (defined in [intro.abstract]). Specifically, can a conforming C++ compiler optimize away or transform a loop containing a relaxed atomic load?
In this article, we will analyze two fascinating code snippets to understand the limits of compiler optimizations, causal loops in memory orders, and why atomic operations are treated differently by the abstract machine.
Part 1: Can #2 Read #1 in the First Snippet?
Let's look at the first example provided:
#include <atomic>
#include <thread>
int main(){
std::atomic<int> ack = 1;
std::atomic<int> val = 0;
std::jthread t1([&]{
while(ack.load(std::memory_order::relaxed) != 0); // #0
val.store(1, std::memory_order::release); // #1
});
std::jthread t2([&]{
val.load(std::memory_order::acquire); // #2
ack.fetch_sub(1, std::memory_order::relaxed); // #3
val.load(std::memory_order::acquire); // #4
});
}Question: Is there a valid execution where the load at #2 reads the value stored at #1?
Answer: No, there is no valid execution where #2 reads #1.
To understand why, let's trace the execution dependencies:
- For
#2to read#1,#1must have executed. - For
#1to execute, the loop at#0must have exited. - For the loop at
#0to exit,ackmust have been loaded as0. - Since
ackis initialized to1, the only way it becomes0is if#3(thefetch_sub) executes. - However,
#3is sequenced-after#2in threadt2.
If #2 were to read #1, it would mean #3 has not yet executed (as #2 runs before #3). But #1 cannot happen unless #3 has already executed and its write to ack has been observed by #0. This creates an impossible causal loop. Therefore, no conforming execution allows #2 to read #1.
Part 2: Can the Compiler Optimize Away the Loop's Atomic Load?
Now, let's look at the second variant where a timeout/break condition is introduced inside the loop:
#include <atomic>
#include <thread>
std::jthread t1([&]{
int count = 0;
while(ack.load(std::memory_order::relaxed) != 0){ // #0
if(++count == 5){
break;
}
std::this_thread::sleep_for(1000ms);
}
val.store(1, std::memory_order::release); // #1
});Question: Is a conforming C++ compiler permitted to optimize the loop in t1 to a fixed loop that completely eliminates the ack.load(), like this?
while(count < 4){
++count;
std::this_thread::sleep_for(1000ms);
}Answer: No, a conforming implementation is not permitted to perform this transformation.
Why the "As-If" Rule Does Not Allow This Transformation
The argument for allowing this optimization usually relies on [intro.abstract] p6, which states that an implementation must produce the observable behavior of one of the possible executions of the abstract machine.
However, this optimization violates several core rules of the C++ memory model:
1. Elimination of Observable Behavior (Timing and Interleaving)
If another thread modifies ack to 0 during the first second of execution, the original program's loop at #0 would exit early (after 1 second). The program would then immediately execute #1 (storing 1 to val).
If the compiler transforms the loop to a fixed 4-second sleep, the program is forced to wait 4 seconds before writing to val, even if ack became 0 instantly. While "timing" is not always strictly protected as observable behavior, changing the logical flow of synchronization across threads violates multi-threaded execution guarantees.
2. The Coherence Requirements (Read-Write Coherence)
Under the C++ memory model, atomic operations have strict coherence requirements. According to [intro.rvalue] and the rules governing the "coherence-ordered before" relation:
- If a thread performs a write to an atomic variable, other threads performing relaxed loads must eventually observe that write (forward progress guarantees).
- By completely removing the
ack.load()call, the compiler makes it impossible fort1to ever observe a write toackfrom another thread. This directly violates the C++ standard's requirement that atomic writes must eventually become visible to atomic loads on other threads.
3. Atomics Are Side Effects
In the C++ abstract machine, reads and writes to std::atomic variables are treated as volatile-like side effects for the purposes of optimization. The compiler cannot assume that an atomic variable's value is constant or unchanged by outside forces, nor can it optimize away an atomic load unless it can prove that no other thread could possibly write to it (which is not the case here, as t2 writes to ack).
Summary
While the C++ compiler has immense freedom to optimize single-threaded code under the as-if rule, multi-threaded coordination via std::atomic imposes strict boundaries.
- Causal loops prevent impossible interleavings (like
#2reading#1in snippet 1). - Coherence and forward progress rules prevent the compiler from optimizing away relaxed atomic loads in loops, ensuring your concurrent synchronization logic remains safe and predictable.