Building a local-first CRDT sync layer for encrypted email vaults
The problem with cloud-first email
Most email clients treat message storage as a server-side concern. Your data lives on someone else's infrastructure, indexed by someone else's search engine, and accessible through someone else's API. When the server is unreachable, your email is unreachable.
TwinMail inverts this model. Every message, thread, and overlay state lives in an encrypted vault on the user's device. The vault is the source of truth. Cloud sync, when enabled, replicates encrypted blobs — the server never sees plaintext content.
This creates a fundamental engineering challenge: how do you synchronize mutable state across devices without a central authority that can read the data?
Why CRDTs fit the vault model
Conflict-free replicated data types provide a mathematical guarantee: any two replicas that have received the same set of operations will converge to the same state, regardless of the order those operations were applied. This eliminates the need for a central coordinator.
For TwinMail's overlay state — read/unread flags, labels, snooze times, delegation permissions — this is a natural fit. Two devices can independently mark the same thread as read, and the CRDT merge function ensures both converge.
The sync protocol
Our sync layer operates in three phases:
1. Local mutation
When a user performs an action — marking a thread as read, applying a label, snoozing a message — the client creates an operation record. Each operation is a tuple of (lamport_timestamp, device_id, operation_type, payload).
Operation {
clock: LamportClock,
device: DeviceID,
op: "set_overlay" | "delete_overlay" | "merge_labels",
target: ThreadID | MessageID,
payload: EncryptedBytes,
}
Operations are appended to a local log and immediately applied to the local vault state.
2. Encrypted replication
When connectivity is available, the device pushes its operation log to the sync server. The server stores encrypted operation blobs keyed by vault ID and clock range. It never decrypts, parses, or indexes the content.
The server's role is limited to:
- Accepting encrypted operation blobs
- Serving operation blobs for a given vault and clock range
- Notifying connected devices of new operations via WebSocket
3. Remote merge
When a device receives operations from the sync server, it decrypts them locally and applies the CRDT merge function. For TwinMail's overlay state, we use a last-writer-wins register for scalar values (read status, snooze time) and an observed-remove set for collection values (labels, tags).
Encryption boundary design
The critical invariant is that the sync server never sees plaintext. We achieve this by encrypting each operation independently using the vault's XChaCha20-Poly1305 key before transmission.
The vault key itself is derived from the user's passphrase via Argon2id. Devices that belong to the same vault share the derived key through a one-time secure key exchange during device pairing.
Conflict resolution strategy
Not all conflicts are equal. TwinMail uses different CRDT strategies based on the semantic meaning of the data:
| Data type | CRDT strategy | Rationale |
|---|---|---|
| Read/unread | LWW register | Last action wins; "mark as read" on any device should propagate |
| Labels | OR-Set | Adding a label on one device should not remove it from another |
| Snooze time | LWW register | Only the most recent snooze intent matters |
| Delegation | OR-Set with tombstones | Revocations must propagate reliably |
Performance characteristics
The sync layer is designed for email-scale workloads. A typical user has tens of thousands of messages but modifies overlay state on only a small fraction per session.
Benchmarks on a 2024 consumer laptop show:
- Local operation apply: under 1ms per operation
- Merge of 1,000 remote operations: under 50ms
- Full vault reindex after sync: under 200ms for 50,000 messages
These numbers make it practical to sync on every app focus event without perceptible latency.
What we learned
Building a CRDT sync layer for encrypted data is harder than building one for plaintext. The encryption boundary means you cannot inspect operation payloads on the server for debugging, conflict analysis, or compression. Every optimization must happen client-side.
The payoff is a sync system where the server is a dumb pipe. It stores bytes, delivers bytes, and knows nothing about what those bytes mean. That is a property worth engineering for.