Can a C++ Compiler Optimize Away Relaxed Atomic Loads?

June 17, 2026 4 min read Source Question

Introduction

In multi-threaded C++ programming, the std::atomic library provides the tools necessary to avoid data races and coordinate threads. However, developers often wonder how far the compiler's optimizer can go under the "as-if" rule (defined in [intro.abstract]). Specifically, can a conforming C++ compiler optimize away or transform a loop containing a relaxed atomic load?

In this article, we will analyze two fascinating code snippets to understand the limits of compiler optimizations, causal loops in memory orders, and why atomic operations are treated differently by the abstract machine.

Part 1: Can #2 Read #1 in the First Snippet?

Let's look at the first example provided:

#include <atomic>
#include <thread>

int main(){
  std::atomic<int> ack = 1;
  std::atomic<int> val = 0;
  std::jthread t1([&]{
    while(ack.load(std::memory_order::relaxed) != 0); // #0
    val.store(1, std::memory_order::release);        // #1
  });
  std::jthread t2([&]{
    val.load(std::memory_order::acquire);            // #2
    ack.fetch_sub(1, std::memory_order::relaxed);    // #3
    val.load(std::memory_order::acquire);            // #4
  });
}

Question: Is there a valid execution where the load at #2 reads the value stored at #1?

Answer: No, there is no valid execution where #2 reads #1.

To understand why, let's trace the execution dependencies:

For #2 to read #1, #1 must have executed.
For #1 to execute, the loop at #0 must have exited.
For the loop at #0 to exit, ack must have been loaded as 0.
Since ack is initialized to 1, the only way it becomes 0 is if #3 (the fetch_sub) executes.
However, #3 is sequenced-after #2 in thread t2.

If #2 were to read #1, it would mean #3 has not yet executed (as #2 runs before #3). But #1 cannot happen unless #3 has already executed and its write to ack has been observed by #0. This creates an impossible causal loop. Therefore, no conforming execution allows #2 to read #1.

Part 2: Can the Compiler Optimize Away the Loop's Atomic Load?

Now, let's look at the second variant where a timeout/break condition is introduced inside the loop:

#include <atomic>
#include <thread>

std::jthread t1([&]{
  int count = 0;
  while(ack.load(std::memory_order::relaxed) != 0){ // #0
     if(++count == 5){
       break;
     }
     std::this_thread::sleep_for(1000ms);
  }
  val.store(1, std::memory_order::release); // #1
});

Question: Is a conforming C++ compiler permitted to optimize the loop in t1 to a fixed loop that completely eliminates the ack.load(), like this?

while(count < 4){
  ++count;
  std::this_thread::sleep_for(1000ms);
}

Answer: No, a conforming implementation is not permitted to perform this transformation.

Why the "As-If" Rule Does Not Allow This Transformation

The argument for allowing this optimization usually relies on [intro.abstract] p6, which states that an implementation must produce the observable behavior of one of the possible executions of the abstract machine.

However, this optimization violates several core rules of the C++ memory model:

1. Elimination of Observable Behavior (Timing and Interleaving)

If another thread modifies ack to 0 during the first second of execution, the original program's loop at #0 would exit early (after 1 second). The program would then immediately execute #1 (storing 1 to val).

If the compiler transforms the loop to a fixed 4-second sleep, the program is forced to wait 4 seconds before writing to val, even if ack became 0 instantly. While "timing" is not always strictly protected as observable behavior, changing the logical flow of synchronization across threads violates multi-threaded execution guarantees.

2. The Coherence Requirements (Read-Write Coherence)

Under the C++ memory model, atomic operations have strict coherence requirements. According to [intro.rvalue] and the rules governing the "coherence-ordered before" relation:

If a thread performs a write to an atomic variable, other threads performing relaxed loads must eventually observe that write (forward progress guarantees).
By completely removing the ack.load() call, the compiler makes it impossible for t1 to ever observe a write to ack from another thread. This directly violates the C++ standard's requirement that atomic writes must eventually become visible to atomic loads on other threads.

3. Atomics Are Side Effects

In the C++ abstract machine, reads and writes to std::atomic variables are treated as volatile-like side effects for the purposes of optimization. The compiler cannot assume that an atomic variable's value is constant or unchanged by outside forces, nor can it optimize away an atomic load unless it can prove that no other thread could possibly write to it (which is not the case here, as t2 writes to ack).

Summary

While the C++ compiler has immense freedom to optimize single-threaded code under the as-if rule, multi-threaded coordination via std::atomic imposes strict boundaries.

Causal loops prevent impossible interleavings (like #2 reading #1 in snippet 1).
Coherence and forward progress rules prevent the compiler from optimizing away relaxed atomic loads in loops, ensuring your concurrent synchronization logic remains safe and predictable.

Introduction

Part 1: Can #2 Read #1 in the First Snippet?

Part 2: Can the Compiler Optimize Away the Loop's Atomic Load?

Why the "As-If" Rule Does Not Allow This Transformation

1. Elimination of Observable Behavior (Timing and Interleaving)

2. The Coherence Requirements (Read-Write Coherence)

3. Atomics Are Side Effects

Summary

More QA Articles

Apache IoTDB Wildcard Query Filtering: How Unqualified WHERE Predicates Work

Why Is Scrapy Not Returning Items? (And How to Fix It)

How to Achieve Type Safety with Angular ngTemplateOutlet for Union Types