The Mystery of the Missing File Size in C++ std::filesystem

If you are transitioning from the POSIX C API to C++'s <filesystem>, you might encounter a puzzling design choice. In POSIX, a single call to stat() populates a struct stat containing the file type, permissions, and size (st_size) all at once. Naturally, you would expect C++'s std::filesystem::file_status to behave similarly.

Instead, std::filesystem::file_status only exposes type() and permissions(). To get the size, you are forced to call std::filesystem::file_size(), which triggers a separate system call. Why was it designed this way, and how can we write efficient code around it?

1. Cross-Platform Abstraction (The Windows Factor)

The primary reason for this design is cross-platform compatibility. The std::filesystem library is heavily based on boost::filesystem, which was designed to work seamlessly across both UNIX-like systems and Windows.

On POSIX systems, retrieving file status and file size is indeed bundled into the stat system call. However, on Windows, the underlying APIs handle these differently. File attributes (like whether a file is a directory) are fast to retrieve, but fetching the file size can sometimes require opening a file handle or querying a different set of APIs, which carries different performance implications depending on the file system (e.g., NTFS vs. network shares).

To ensure consistent behavior and performance characteristics across all operating systems, the C++ committee decided not to bundle file size into the basic file_status object.

2. Semantic Separation of Concerns

In C++ design philosophy, file_status represents the nature or identity of the file (e.g., is it a regular file, a directory, or a symlink? What are its access permissions?).

File size, on the other hand, is highly dynamic metadata. A file's size changes constantly as it is written to, whereas its type and permissions change rarely. Bundling them together would force file_status to represent transient data, which violates the separation of concerns and could lead to caching stale data in situations where only the type/permissions are needed.

3. The Clever Design of std::filesystem::directory_entry

If you are iterating through a directory using std::filesystem::directory_iterator, you might notice that the std::filesystem::directory_entry object does cache the file size, allowing you to call entry.file_size() without an extra system call:

for (const auto& entry : std::filesystem::directory_iterator(path)) {
    // This uses cached data and does NOT trigger an extra syscall!
    std::cout << entry.path() << " : " << entry.file_size() << '\n';
}

Why does it work here? When OS-level directory traversal APIs (like readdir on Linux or FindFirstFileW/FindNextFileW on Windows) are called, the OS naturally returns a batch of metadata—including file size—as part of the directory listing.

std::filesystem::directory_entry is specifically designed to act as a cache for this OS-provided bundle to prevent "N+1" system call problems during directory traversal.

How to Write Efficient C++ Filesystem Code

To avoid redundant system calls in your applications, follow these best practices:

  • Use directory_entry when iterating: If you are traversing a directory, always query metadata (like size, modification time, etc.) directly from the directory_entry rather than passing the path back to free-standing functions like std::filesystem::file_size(path).
  • Query explicitly when needed: If you only need the size of a single file, call std::filesystem::file_size() directly. Accept that on POSIX it will call stat() under the hood, which is still highly optimized by modern OS kernels.