Interactive architecture map of Discord's real-time messaging infrastructure — WebSocket gateway, voice servers, database migrations, and the Elixir-Rust performance story, compiled from engineering blog posts and public sources.
Discord's backend is a polyglot architecture built on Elixir/BEAM for real-time messaging, Python for the HTTP API, Rust for performance-critical data services, and C++ for voice/video media processing. A 5-person chat infrastructure team manages ~20 Elixir microservices across 400-500 machines.
graph TD
subgraph Clients["Client Layer"]
WEB["Web Client
(Browser WebRTC)"]
DESK["Desktop Client
(C++ Engine)"]
MOB["Mobile Client
(C++ Engine)"]
end
subgraph Gateway["Gateway Layer · Elixir"]
GW["WebSocket Gateway
(Cowboy + GenStage)"]
PUBSUB["Pub/Sub Event Bus"]
end
subgraph Services["Application Services"]
API["REST API
(Python Monolith)"]
RS["Read States
(Rust + Tokio)"]
DS["Data Services
(Rust + Coalescing)"]
VS["Voice Signaling
(Elixir)"]
end
subgraph Media["Media Layer"]
SFU["Voice SFU
(C++ / DAVE E2EE)"]
MP["Media Proxy
(Rust + Lilliput)"]
end
subgraph Data["Data Layer"]
SCYLLA["ScyllaDB
(Messages)"]
ES["Elasticsearch
(Search Index)"]
CASS["Cassandra
(Read States)"]
REDIS["Redis
(Cache)"]
end
Clients --> GW
GW --> PUBSUB
Clients --> API
API --> DS
DS --> SCYLLA
DS --> ES
RS --> CASS
GW --> VS
VS --> SFU
MP --> REDIS
style WEB fill:#1a1a28,stroke:#5865F2,color:#fff
style DESK fill:#1a1a28,stroke:#5865F2,color:#fff
style MOB fill:#1a1a28,stroke:#5865F2,color:#fff
style GW fill:#12121c,stroke:#aa00ff,color:#fff
style PUBSUB fill:#12121c,stroke:#aa00ff,color:#fff
style API fill:#12121c,stroke:#ffee00,color:#fff
style RS fill:#12121c,stroke:#ff6600,color:#fff
style DS fill:#12121c,stroke:#ff6600,color:#fff
style VS fill:#12121c,stroke:#aa00ff,color:#fff
style SFU fill:#12121c,stroke:#0066ff,color:#fff
style MP fill:#12121c,stroke:#ff6600,color:#fff
style SCYLLA fill:#12121c,stroke:#00ff66,color:#fff
style ES fill:#12121c,stroke:#00ff66,color:#fff
style CASS fill:#12121c,stroke:#00ff66,color:#fff
style REDIS fill:#12121c,stroke:#00ff66,color:#fff
Core real-time backbone. ~20 microservices using Distributed Erlang with partial mesh topology. etcd for service discovery.
Performance-critical services: Read States, Data Services (DB proxy), Media Proxy, game SDK, Go Live video capture, and Elixir NIFs.
Powers the HTTP REST API monolith handling CRUD operations for guilds, channels, users, and messages.
Voice/video SFU media engine, native client audio engine, and ScyllaDB itself. Custom engine bypasses OS audio ducking.
The gateway is the backbone of Discord's real-time event system. Every active client maintains a persistent WebSocket connection. The gateway pushes messages, presence updates, and typing indicators without polling, using GenStage for back-pressure and load-shedding.
graph LR
subgraph Clients["Connected Clients"]
C1["Client Shard 0"]
C2["Client Shard 1"]
C3["Client Shard N"]
end
subgraph GW["Gateway Servers · Elixir"]
COW["Cowboy
WebSocket/TCP"]
GS["GenStage
Back-Pressure"]
end
subgraph Events["Event Sources"]
MSG["New Messages"]
PRES["Presence Changes"]
TYPE["Typing Indicators"]
end
subgraph Push["Push Pipeline"]
PC["Push Collector
(1 proc/machine)"]
PUSHER["Pusher Consumers
(demand: 100)"]
XMPP["Firebase XMPP"]
end
Events --> GS
GS --> COW
COW --> Clients
Events --> PC
PC --> PUSHER
PUSHER --> XMPP
style C1 fill:#1a1a28,stroke:#5865F2,color:#fff
style C2 fill:#1a1a28,stroke:#5865F2,color:#fff
style C3 fill:#1a1a28,stroke:#5865F2,color:#fff
style COW fill:#12121c,stroke:#aa00ff,color:#fff
style GS fill:#12121c,stroke:#aa00ff,color:#fff
style MSG fill:#12121c,stroke:#00f0ff,color:#fff
style PRES fill:#12121c,stroke:#00f0ff,color:#fff
style TYPE fill:#12121c,stroke:#00f0ff,color:#fff
style PC fill:#12121c,stroke:#ff00aa,color:#fff
style PUSHER fill:#12121c,stroke:#ff00aa,color:#fff
style XMPP fill:#12121c,stroke:#ff00aa,color:#fff
Discord migrated gateway compression from zlib to Zstandard, achieving a 40% reduction in bandwidth usage across all WebSocket connections.
Push notifications use a two-stage GenStage pipeline. The Push Collector (1 Erlang process per machine) buffers requests, while Pusher consumers demand exactly 100 at a time. Firebase XMPP is used instead of HTTP because XMPP enforces a 100-pending-request limit per connection, providing natural backpressure. The system handles bursts of 1M+ push requests per minute via load-shedding when the buffer fills.
Discord uses Rust NIFs (Native Implemented Functions) via Rustler to accelerate hot paths within the BEAM VM. The member list sorted insertion problem drove the first major adoption, achieving a 160x improvement over pure Elixir at scale.
graph LR
subgraph BEAM["BEAM VM (Elixir)"]
GUILD["Guild GenServer"]
MOD["SortedSet Module
(Elixir Interface)"]
end
subgraph NIF["Rust NIF Layer"]
RUSTLER["Rustler
(Safe Bindings)"]
SORTED["SortedSet
(Rust BTreeSet)"]
end
subgraph Perf["Performance at 1M Items"]
BEST["Best: 0.61 us"]
WORST["Worst: 3.68 us"]
end
GUILD --> MOD
MOD --> RUSTLER
RUSTLER --> SORTED
SORTED --> Perf
style GUILD fill:#12121c,stroke:#aa00ff,color:#fff
style MOD fill:#12121c,stroke:#aa00ff,color:#fff
style RUSTLER fill:#12121c,stroke:#ff6600,color:#fff
style SORTED fill:#12121c,stroke:#ff6600,color:#fff
style BEST fill:#0a0a0f,stroke:#00ff66,color:#00ff66
style WORST fill:#0a0a0f,stroke:#00ff66,color:#00ff66
Guilds with 100,000+ members need sorted member lists. Updating a list when a member joins requires a sorted insertion that reports the index. Pure Elixir solutions (MapSet, ordsets, custom skip-list Cells) topped out at 27,000 microseconds worst-case for 250K items.
The Rust SortedSet NIF handles 1,000,000 items with sub-4 microsecond worst-case latency. All operations stay under 1ms, eliminating the need for BEAM reductions or yielding. The NIF appears as a regular Elixir module to callers and powers every single Discord guild's member list.
| Solution | 250K Best | 250K Worst | Language |
|---|---|---|---|
| MapSet | 31,644 us | 57,580 us | Elixir |
| :ordsets | 20,438 us | 27,390 us | Elixir |
| Rust SortedSet (250K) | 0.4 us | 1.2 us | Rust |
| Rust SortedSet (1M) | 0.61 us | 3.68 us | Rust |
Read States tracks which channels and messages each user has read. Accessed on every connection, message send, and read action. The Go implementation suffered from garbage collector latency spikes every 2 minutes; the Rust rewrite eliminated them entirely.
graph TD
subgraph Clients["Incoming Requests"]
CONN["Connection Events"]
SEND["Message Sends"]
READ["Read Actions"]
end
subgraph Service["Read States Service · Rust"]
TOKIO["Tokio Async Runtime"]
LRU["BTreeMap LRU Cache
(8M states/node)"]
end
subgraph Persist["Persistence"]
EVICT["Immediate Eviction
Commit"]
SCHED["Scheduled Commit
(30s window)"]
CASS["Cassandra"]
end
Clients --> TOKIO
TOKIO --> LRU
LRU --> EVICT
LRU --> SCHED
EVICT --> CASS
SCHED --> CASS
style CONN fill:#1a1a28,stroke:#5865F2,color:#fff
style SEND fill:#1a1a28,stroke:#5865F2,color:#fff
style READ fill:#1a1a28,stroke:#5865F2,color:#fff
style TOKIO fill:#12121c,stroke:#ff6600,color:#fff
style LRU fill:#12121c,stroke:#ff6600,color:#fff
style EVICT fill:#12121c,stroke:#00ff66,color:#fff
style SCHED fill:#12121c,stroke:#00ff66,color:#fff
style CASS fill:#12121c,stroke:#00ff66,color:#fff
Go's garbage collector ran every 2 minutes, scanning the entire LRU cache to check for unreferenced memory. This caused periodic latency spikes proportional to cache size. Reducing cache size lowered spike magnitude but increased cache misses -- a lose-lose tradeoff.
Rust's ownership-based memory model means evicted items are immediately freed with no GC scanning. Average response time dropped to microseconds, capacity increased to 8 million Read States per node, and all latency spikes were eliminated. Built on Tokio async runtime with BTreeMap for memory efficiency.
Discord's message storage evolved from MongoDB (2015) to Cassandra (2017) to ScyllaDB (post-2022). The Rust Data Services layer sits between the API and database, providing request coalescing and consistent hash routing for cache locality.
graph LR
M2015["2015
MongoDB
(Initial)"]
C2017["2017
Cassandra
12 nodes"]
C2022["2022
Cassandra
177 nodes"]
S2023["Post-2022
ScyllaDB
72 nodes"]
M2015 --> C2017
C2017 --> C2022
C2022 --> S2023
style M2015 fill:#1a1a28,stroke:#6a6a80,color:#b8b8cc
style C2017 fill:#1a1a28,stroke:#ffee00,color:#fff
style C2022 fill:#1a1a28,stroke:#ff0044,color:#fff
style S2023 fill:#1a1a28,stroke:#00ff66,color:#fff
graph TD
subgraph API["API Layer"]
REST["Python REST API"]
end
subgraph DS["Data Services · Rust"]
ROUTER["Consistent Hash Router
(channel_id routing)"]
COAL["Request Coalescing
(deduplicate concurrent reads)"]
end
subgraph DB["ScyllaDB Cluster"]
N1["Node 1
(9TB)"]
N2["Node 2
(9TB)"]
N3["Node 3
(9TB)"]
end
REST --> ROUTER
ROUTER --> COAL
COAL --> N1
COAL --> N2
COAL --> N3
style REST fill:#12121c,stroke:#ffee00,color:#fff
style ROUTER fill:#12121c,stroke:#ff6600,color:#fff
style COAL fill:#12121c,stroke:#ff6600,color:#fff
style N1 fill:#12121c,stroke:#00ff66,color:#fff
style N2 fill:#12121c,stroke:#00ff66,color:#fff
style N3 fill:#12121c,stroke:#00ff66,color:#fff
Messages are partitioned by channel_id combined with static time buckets. Each message uses a Snowflake ID (chronologically sortable, embeds timestamp) and is replicated across 3 nodes. The migration from 177 Cassandra nodes to 72 ScyllaDB nodes was executed using a custom Rust migrator with SQLite checkpointing, completing in 9 days instead of the estimated 3 months.
| Metric | Cassandra | ScyllaDB |
|---|---|---|
| P99 Read Latency | 40-125ms | 15ms |
| P99 Write Latency | 5-70ms | 5ms (steady) |
| Cluster Size | 177 nodes | 72 nodes |
| Disk per Node | -- | 9 TB |
Discord's search evolved from 2 large Elasticsearch clusters (200+ nodes each) to a modern "cell" topology of 40 smaller clusters on Kubernetes with ECK. The redesign eliminated cross-node batch failures, improved query latency from 500ms to under 100ms median, and doubled indexing throughput.
graph TD
subgraph Routing["Message Router"]
PUB["PubSub Router
(batch by destination)"]
end
subgraph GuildCell["Guild Messages Cell"]
GI["Ingest Nodes"]
GM["Master Nodes"]
GD["Data Nodes"]
end
subgraph DMCell["User DM Cell"]
DI["Ingest Nodes"]
DM["Master Nodes"]
DD["Data Nodes"]
end
subgraph Meta["Shard Mapping"]
CASS["Cassandra
(Source of Truth)"]
REDIS["Redis
(Cache)"]
end
PUB --> GI
PUB --> DI
GI --> GD
DI --> DD
GM --> GD
DM --> DD
CASS --> REDIS
style PUB fill:#12121c,stroke:#00f0ff,color:#fff
style GI fill:#12121c,stroke:#ff00aa,color:#fff
style GM fill:#12121c,stroke:#ff00aa,color:#fff
style GD fill:#12121c,stroke:#ff00aa,color:#fff
style DI fill:#12121c,stroke:#aa00ff,color:#fff
style DM fill:#12121c,stroke:#aa00ff,color:#fff
style DD fill:#12121c,stroke:#aa00ff,color:#fff
style CASS fill:#12121c,stroke:#00ff66,color:#fff
style REDIS fill:#12121c,stroke:#00ff66,color:#fff
Guild messages sharded by guild_id in a dedicated guild-messages ES cell. BFGs (Big Freaking Guilds) get specialized multi-shard indices.
Direct messages sharded by user_id in a separate user-dm-messages ES cell for isolation.
Zero-downtime reindexing for BFG migrations using parallel index writes during cutover.
ES stores attachment names and message text but only returns message_id, channel_id, guild_id to avoid data duplication with ScyllaDB.
Three backend services power voice: the Discord Gateway (WebSocket events), Discord Guilds (voice server assignment and state), and Discord Voice (signaling + SFU). The homegrown C++ SFU handles 2.6M concurrent voice users across 850+ servers in 13 regions.
graph TD
subgraph Client["Client"]
CWEB["Browser
(Native WebRTC)"]
CNAT["Desktop/Mobile
(Custom C++ Engine)"]
end
subgraph Control["Control Plane"]
GW["Discord Gateway
(WebSocket)"]
GUILDS["Discord Guilds
(Voice State)"]
SIG["Voice Signaling
(Keys + Stream IDs)"]
end
subgraph Media["Media Plane"]
SFU["C++ SFU
(Selective Forwarding)"]
ENC["Salsa20 Encryption
+ DAVE E2EE"]
OPUS["Opus Codec
48kHz Stereo"]
end
Client --> GW
GW --> GUILDS
GUILDS --> SIG
SIG --> SFU
CWEB --> SFU
CNAT --> SFU
SFU --> ENC
SFU --> OPUS
style CWEB fill:#1a1a28,stroke:#5865F2,color:#fff
style CNAT fill:#1a1a28,stroke:#5865F2,color:#fff
style GW fill:#12121c,stroke:#aa00ff,color:#fff
style GUILDS fill:#12121c,stroke:#aa00ff,color:#fff
style SIG fill:#12121c,stroke:#aa00ff,color:#fff
style SFU fill:#12121c,stroke:#0066ff,color:#fff
style ENC fill:#12121c,stroke:#ff0044,color:#fff
style OPUS fill:#12121c,stroke:#00f0ff,color:#fff
Discord replaces standard SDP signaling (~10KB) with a minimal ~1000 byte payload containing only server address, encryption method, codec, and stream ID. ICE negotiation is skipped entirely since all clients connect through relay servers, which also hides user IPs. DTLS/SRTP encryption is replaced with Salsa20 for performance. During silent periods, no audio is transmitted, requiring sequence number rewriting.
End-to-end encryption for DMs, group DMs, voice channels, and Go Live streams, enforced for all non-Stage voice calls since March 2, 2026. Uses WebRTC Encoded Transforms + Messaging Layer Security (MLS) for group key exchange with epoch-based rotation when participants join or leave. The protocol is open-source and externally audited by Trail of Bits.
Discord exposes two API surfaces: the HTTP REST API (Python monolith for CRUD) and the WebSocket Gateway (Elixir for real-time events). Bots can receive interactions via persistent gateway connection or outgoing webhooks to a configured URL, enabling serverless architectures.
graph TD
subgraph Discord["Discord Platform"]
REST["HTTP REST API
(Python)"]
GWAPI["WebSocket Gateway
(Elixir)"]
end
subgraph Gateway["Gateway Bot"]
GBOT["Bot Process
(Persistent WS)"]
GINT["INTERACTION_CREATE
Event"]
end
subgraph Webhook["Webhook Bot"]
WURL["Interactions Endpoint
(Configured URL)"]
LAMB["Lambda / Serverless"]
end
subgraph Limits["Rate Limiting"]
ROUTE["Per-Route
(X-RateLimit-Bucket)"]
GLOBAL["Global
(50 req/sec)"]
INVALID["Invalid Request
(10K/10min ban)"]
end
REST --> Gateway
GWAPI --> GBOT
GBOT --> GINT
REST --> Webhook
REST --> WURL
WURL --> LAMB
REST --> Limits
style REST fill:#12121c,stroke:#ffee00,color:#fff
style GWAPI fill:#12121c,stroke:#aa00ff,color:#fff
style GBOT fill:#1a1a28,stroke:#5865F2,color:#fff
style GINT fill:#1a1a28,stroke:#5865F2,color:#fff
style WURL fill:#1a1a28,stroke:#00f0ff,color:#fff
style LAMB fill:#1a1a28,stroke:#00f0ff,color:#fff
style ROUTE fill:#0a0a0f,stroke:#ff6600,color:#ff6600
style GLOBAL fill:#0a0a0f,stroke:#ff6600,color:#ff6600
style INVALID fill:#0a0a0f,stroke:#ff0044,color:#ff0044
| Rate Limit Tier | Scope | Limit |
|---|---|---|
| Per-Route | Endpoint + method, scoped by guild/channel/webhook | Varies (X-RateLimit-Bucket header) |
| Global | All endpoints per bot | 50 requests/second |
| Invalid Requests | 401/403/429 responses per IP | 10,000 per 10 minutes (then temp ban) |
Discord serves media through two domains: cdn.discordapp.com for static originals and media.discordapp.net for the Rust Media Proxy that inspects, converts, and resizes every attachment and embedded image on the fly. The open-source Lilliput library handles image processing with WebP, AVIF, and GIF support.
graph LR
subgraph Upload["Upload"]
CLIENT["Client Upload"]
end
subgraph Process["Media Proxy · Rust"]
DETECT["Detection
(format, animation)"]
TRANSFORM["Transformation
(resize, convert)"]
OPTIMIZE["Optimization
(WebP/AVIF)"]
end
subgraph Delivery["Delivery"]
CDN["cdn.discordapp.com
(Static Originals)"]
MEDIA["media.discordapp.net
(Processed Assets)"]
end
CLIENT --> DETECT
DETECT --> TRANSFORM
TRANSFORM --> OPTIMIZE
OPTIMIZE --> CDN
OPTIMIZE --> MEDIA
style CLIENT fill:#1a1a28,stroke:#5865F2,color:#fff
style DETECT fill:#12121c,stroke:#ff6600,color:#fff
style TRANSFORM fill:#12121c,stroke:#ff6600,color:#fff
style OPTIMIZE fill:#12121c,stroke:#ff6600,color:#fff
style CDN fill:#12121c,stroke:#00f0ff,color:#fff
style MEDIA fill:#12121c,stroke:#00f0ff,color:#fff
8-bit color (16M colors), superior compression, near-universal browser support. 29% median size reduction over GIF for animated emojis.
Up to 12-bit color, HDR support, advanced compression. HDR content is tone-mapped to SDR when converting to 8-bit formats.
Legacy format retained for compatibility. 95%+ animated emoji requests now served as WebP instead.
The is_animated flag is propagated throughout all API systems and respects the user's Reduced Motion accessibility setting, ensuring animated content can be paused for users who need it.