mmap-sync
is a Rust crate designed to manage high-performance, concurrent data access between a single writer process and multiple reader processes, leveraging the benefits of memory-mapped files, wait-free synchronization, and zero-copy deserialization.
We're using mmap-sync
for large-scale machine learning, detailed in our blog post: "Every Request, Every Microsecond: Scalable machine learning at Cloudflare".
At the core of mmap-sync
is a Synchronizer
structure that offers a simple interface with "write" and "read" methods, allowing users to read and write any Rust struct (T
) that implements or derives certain rkyv traits.
impl Synchronizer {
/// Write a given `entity` into the next available memory mapped file.
pub fn write<T>(&mut self, entity: &T, grace_duration: Duration) -> Result<(usize, bool), SynchronizerError> {
…
}
/// Reads and returns `entity` struct from mapped memory wrapped in `ReadResult`
pub fn read<T>(&mut self) -> Result<ReadResult<T>, SynchronizerError> {
…
}
}
Data is stored in shared mapped memory, allowing the Synchronizer
to "write" and "read" from it concurrently.
This makes mmap-sync
a highly efficient and flexible tool for managing shared, concurrent data access.
The use of memory-mapped files offers several advantages over other inter-process communication (IPC) mechanisms.
It allows different processes to access the same memory space, bypassing the need for costly serialization and deserialization.
This design allows mmap-sync
to provide extremely fast, low-overhead data sharing between processes.
Our wait-free data access pattern draws inspiration from Linux kernel's Read-Copy-Update (RCU) pattern and the Left-Right concurrency control technique. In our solution, we maintain two copies of the data in separate memory-mapped files. Write access to this data is managed by a single writer, with multiple readers able to access the data concurrently.
We store the synchronization state, which coordinates access to these data copies, in a third memory-mapped file, referred to as "state".
This file contains an atomic 64-bit integer, which represents an InstanceVersion
and a pair of additional atomic 32-bit variables, tracking the number of active readers for each data copy.
The InstanceVersion
consists of the currently active data file index (1 bit), the data size (39 bits, accommodating data sizes up to 549 GB), and a data checksum (24 bits).
To efficiently store and fetch data, mmap-sync
utilizes zero-copy deserialization with the help of the rkyv library, directly referencing bytes in the serialized form.
This significantly reduces the time and memory required to access and use data.
The templated type T
for Synchronizer
can be any Rust struct implementing specified rkyv
traits.
To use mmap-sync
, add it to your Cargo.toml
under [dependencies]
:
[dependencies]
mmap-sync = "1"
Then, import mmap-sync
in your Rust program:
use mmap_sync::synchronizer::Synchronizer;
Check out the provided examples for detailed usage:
- Writer process example: This example demonstrates how to define a Rust struct and write it into shared memory using
mmap-sync
. - Reader process example: This example shows how to read data written into shared memory by a writer process.
These examples share a common module that defines the data structure being written and read.
To run these examples, follow these steps:
- Open a terminal and navigate to your project directory.
- Execute the writer example with the command
cargo run --example writer
. - In the same way, run the reader example using
cargo run --example reader
.
Upon successful execution of these examples, the terminal output should resemble:
# Writer example
written: 36 bytes | reset: false
# Reader example
version: 7 messages: ["Hello", "World", "!"]
Moreover, the following files will be created:
$ stat -c '%A %s %n' /tmp/hello_world_*
-rw-r----- 36 /tmp/hello_world_data_0
-rw-r----- 36 /tmp/hello_world_data_1
-rw-rw---- 16 /tmp/hello_world_state
With these steps, you can start utilizing mmap-sync
in your Rust applications for efficient concurrent data access across processes.
Using tmpfs
volume will reduce the disk I/O latency since it operates directly on RAM, offering faster read and write capabilities compared to conventional disk-based storage:
let mut synchronizer = Synchronizer::new("/dev/shm/hello_world".as_ref());
This change points the synchronizer to use a shared memory object located in a tmpfs
filesystem, which is typically mounted at /dev/shm
on most Linux systems. This should help alleviate some of the bottlenecks associated with disk I/O.
If /dev/shm
does not provide enough space or if you want to create a dedicated tmpfs
instance, you can set up your own with the desired size. For example, to create a 1GB tmpfs
volume, you can use the following command:
sudo mount -t tmpfs -o size=1G tmpfs /mnt/mytmpfs
To run benchmarks you first need to install cargo-criterion
binary:
cargo install cargo-criterion
Then you'll be able to run benchmarks with the following command:
cargo criterion --bench synchronizer
Benchmarks presented below are executed on Linux laptop with 13th Gen Intel(R) Core(TM) i7-13800H
processor and compiler flags set to RUSTFLAGS=-C target-cpu=native
.
synchronizer/write
time: [250.71 ns 251.42 ns 252.41 ns]
thrpt: [3.9619 Melem/s 3.9774 Melem/s 3.9887 Melem/s]
synchronizer/write_raw
time: [145.25 ns 145.53 ns 145.92 ns]
thrpt: [6.8531 Melem/s 6.8717 Melem/s 6.8849 Melem/s]
synchronizer/read/check_bytes_true
time: [40.114 ns 40.139 ns 40.186 ns]
thrpt: [24.884 Melem/s 24.914 Melem/s 24.929 Melem/s]
synchronizer/read/check_bytes_false
time: [26.658 ns 26.673 ns 26.696 ns]
thrpt: [37.458 Melem/s 37.491 Melem/s 37.512 Melem/s]