Interactive architecture map of Spotify's engineering infrastructure — microservices, data pipelines, recommendation systems, and developer tooling compiled from publicly available sources.
Spotify's architecture is built on event-driven microservices running on Google Cloud Platform. Over 90 teams and 600+ developers manage 810+ active services, processing more than 1 trillion events per day across 1,800+ distinct event types.
graph TD
subgraph Clients["Client Layer"]
MOB["Mobile Apps
(iOS / Android)"]
WEB["Web Player"]
DSK["Desktop App"]
end
subgraph Edge["Edge & Delivery"]
CDN["Multi-CDN
(Akamai, CloudFront, Fastly)"]
API["API Gateway"]
end
subgraph Services["Microservices (810+)"]
STREAM["Audio
Streaming"]
SEARCH["Search"]
REC["Recommendations"]
USER["User &
Auth"]
PAY["Payments"]
POD["Podcast
Pipeline"]
end
subgraph Platform["Platform Layer"]
K8S["Kubernetes
(GCP)"]
EDI["Event Delivery
Infrastructure"]
BACK["Backstage
Dev Portal"]
end
Clients --> Edge
Edge --> Services
Services --> Platform
style MOB fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
style CDN fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style API fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style REC fill:#E67E22,stroke:#E67E22,color:#E8E0F0
style K8S fill:#1A1530,stroke:#5DADE2,color:#E8E0F0
style EDI fill:#1A1530,stroke:#8B44AC,color:#E8E0F0
style BACK fill:#1A1530,stroke:#F1C40F,color:#E8E0F0
Each microservice owns its own database and handles a single domain concern. Services communicate via gRPC with Protobuf for synchronous calls and Google Cloud Pub/Sub for asynchronous event processing.
graph LR
subgraph Sync["Synchronous (gRPC)"]
SA["Service A"] -->|"gRPC / Protobuf"| SB["Service B"]
SB -->|"gRPC / Protobuf"| SC["Service C"]
end
subgraph Async["Asynchronous (Events)"]
SD["Service D"] -->|"publish"| PS["Cloud Pub/Sub
Topic"]
PS -->|"subscribe"| SE["Service E"]
PS -->|"subscribe"| SF["Service F"]
end
subgraph Discovery["Service Discovery"]
NL["Nameless
(registry)"]
end
SA -.->|"register"| NL
SB -.->|"register"| NL
style SA fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style SB fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style SC fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style SD fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
style PS fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style SE fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
style SF fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
style NL fill:#E67E22,stroke:#E67E22,color:#E8E0F0
Originally built on Kafka 0.7, Spotify migrated to Google Cloud Pub/Sub in 2016-2017. The Event Service parses and routes events to per-type Pub/Sub topics. When teams define event schemas, Kubernetes operators automatically deploy queues, anonymization pipelines, and streaming jobs.
graph LR
subgraph Producers["Event Producers"]
APP["Client Apps"]
SVC["Backend
Services"]
end
subgraph EDI["Event Delivery Infrastructure"]
ES["Event Service
(parser + router)"]
SCHEMA["Schema
Registry"]
end
subgraph Transport["Transport"]
PUBSUB["Cloud Pub/Sub
(per-type topics)"]
end
subgraph Processing["Processing"]
ANON["Anonymization
Pipeline"]
ETL["ETL /
Dataflow Jobs"]
end
subgraph Storage["Storage"]
BQ["BigQuery"]
GCS["Cloud Storage"]
end
APP --> ES
SVC --> ES
ES --> SCHEMA
ES --> PUBSUB
PUBSUB --> ANON
ANON --> ETL
ETL --> BQ
ETL --> GCS
style ES fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
style PUBSUB fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style BQ fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style GCS fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style ANON fill:#C0392B,stroke:#C0392B,color:#E8E0F0
The Event Delivery Infrastructure processes 1+ trillion events per day (~70 TB compressed daily) across 1,800+ distinct event types. The largest single service handles ~10 million requests/second.
Introduced in 2011 by Henrik Kniberg and Anders Ivarsson, Spotify's organizational model maps directly to how microservices are owned and operated. Each squad functions as a mini-startup with full autonomy — a textbook case of Conway's Law.
graph TD
subgraph Tribe1["Tribe: Mobile Player (<100 people)"]
SQ1["Squad: Playback
(6-12 people)"]
SQ2["Squad: Queue
Management"]
SQ3["Squad: Offline
Mode"]
end
subgraph Tribe2["Tribe: Content Platform"]
SQ4["Squad: Search
Experience"]
SQ5["Squad: Catalog
Ingestion"]
end
subgraph Chapters["Chapters (within tribe)"]
CH1["Backend
Engineers"]
CH2["iOS
Engineers"]
end
subgraph Guilds["Guilds (cross-tribe)"]
G1["AI Guild"]
G2["DevOps Guild"]
end
SQ1 -.-> CH1
SQ2 -.-> CH1
SQ3 -.-> CH2
SQ4 -.-> CH1
SQ1 -.-> G1
SQ4 -.-> G1
SQ5 -.-> G2
style SQ1 fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style SQ2 fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style SQ3 fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style SQ4 fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
style SQ5 fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
style CH1 fill:#E67E22,stroke:#E67E22,color:#E8E0F0
style CH2 fill:#E67E22,stroke:#E67E22,color:#E8E0F0
style G1 fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style G2 fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
6-12 people, cross-functional, full autonomy. Owns one or more microservices end-to-end with a long-term mission.
Collection of squads grouped by business area, capped at ~100 people (Dunbar's number) with a Tribe Lead.
Same-skill engineers within a tribe. Led by a Chapter Lead who serves as line manager.
Informal, cross-tribe communities of interest. Volunteer-led, no formal hierarchy.
Built internally to manage 800+ microservices across 500+ engineering teams. Open-sourced in 2020, now a CNCF incubating project. Three-layer architecture: React frontend, Node.js backend, PostgreSQL database.
graph TD
subgraph Frontend["Frontend (React)"]
UI["Plugin-based UI
(extension tree)"]
end
subgraph Backend["Backend (Node.js)"]
CAT["Software
Catalog"]
SCAFF["Software
Templates"]
TDOC["TechDocs
(Markdown)"]
K8P["Kubernetes
Plugin"]
SRCH["Search"]
end
subgraph Data["Database Layer"]
PG["PostgreSQL
(production)"]
SQLITE["SQLite
(development)"]
end
subgraph External["External Integrations"]
GH["GitHub /
GitLab"]
K8S["Kubernetes
Clusters"]
CACHE["Redis /
Memcache"]
end
UI --> Backend
Backend --> PG
Backend --> SQLITE
CAT --> GH
SCAFF --> GH
K8P --> K8S
Backend --> CACHE
style UI fill:#F1C40F,stroke:#F1C40F,color:#0D0B14
style CAT fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
style SCAFF fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
style TDOC fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
style PG fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style GH fill:#1A1530,stroke:#5DADE2,color:#E8E0F0
| Plugin | Purpose | Type |
|---|---|---|
| Software Catalog | Central metadata repository for all services, APIs, libraries, and teams (YAML descriptors) | Core |
| Software Templates | Standardized project scaffolding — code skeletons, variable injection, VCS publish | Core |
| TechDocs | Docs-like-code: Markdown alongside code, rendered and searchable inside Backstage | Core |
| Kubernetes | View pod status, deployments, and logs for cataloged services | Plugin |
| Search | Unified search across catalog entities and documentation | Core |
Spotify Portal for Backstage adds no-code setup, service maturity scoring, incident management integration, and advanced analytics on top of the open-source platform.
Spotify serves 50+ million tracks plus images and assets to 200+ million monthly active users through a multi-CDN strategy with adaptive bitrate streaming.
graph LR
subgraph Origin["Origin Storage"]
S3["AWS S3"]
GCS["Google Cloud
Storage"]
end
subgraph CDN["Multi-CDN Layer"]
AK["Akamai
(audio primary)"]
CF["AWS CloudFront
(audio secondary)"]
FAST["Fastly
(metadata, images)"]
end
subgraph Quality["Audio Formats"]
OGG["Ogg Vorbis
(primary)"]
AAC["AAC"]
end
subgraph Bitrate["Adaptive Bitrate"]
B96["96 kbps"]
B160["160 kbps"]
B320["320 kbps"]
end
subgraph Client["Clients"]
PLAY["Player"]
end
Origin --> CDN
OGG --> B96
OGG --> B160
OGG --> B320
CDN --> PLAY
Bitrate -.-> CDN
style S3 fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style GCS fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style AK fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style CF fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style FAST fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style PLAY fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
Business-critical audio streaming with low latency and high bandwidth. Primary CDN tier for music playback.
Images, client updates, metadata. Uses Varnish Configuration Language for intelligent caching.
Internal self-service CDN configuration tool combining Fastly APIs with VCL for team-managed rules.
Spotify's recommendation system powers Discover Weekly, Release Radar, Daily Mixes, and Wrapped. It fuses three core techniques: collaborative filtering, NLP-based content analysis, and deep learning on raw audio.
graph TD
subgraph Signals["Input Signals"]
LISTEN["Listening
History"]
PLAYLISTS["Playlist
Co-occurrence"]
SOCIAL["Web Crawl
(blogs, reviews)"]
AUDIO["Raw Audio
Spectrograms"]
end
subgraph Models["ML Models"]
CF["Collaborative
Filtering"]
NLP["NLP Content
Analysis"]
CNN["CNN Audio
Analysis"]
end
subgraph Ranking["Ranking Layer"]
RANK["ML Ranking
(GBDT / Neural)"]
CONTEXT["Context Features
(time, device)"]
end
subgraph Output["Output"]
DW["Discover
Weekly"]
RR["Release
Radar"]
DM["Daily
Mixes"]
end
LISTEN --> CF
PLAYLISTS --> CF
SOCIAL --> NLP
AUDIO --> CNN
CF --> RANK
NLP --> RANK
CNN --> RANK
CONTEXT --> RANK
RANK --> Output
style CF fill:#E67E22,stroke:#E67E22,color:#E8E0F0
style NLP fill:#E67E22,stroke:#E67E22,color:#E8E0F0
style CNN fill:#E67E22,stroke:#E67E22,color:#E8E0F0
style RANK fill:#C0392B,stroke:#C0392B,color:#E8E0F0
style DW fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style RR fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style DM fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
Analyzes billions of user-created playlists for co-occurrence patterns. Matrix factorization at massive scale.
Powered by The Echo Nest (acquired 2014). Crawls music blogs and reviews to build "cultural vectors" for artists.
4 convolutional + 3 fully-connected layers analyze spectrograms. Extracts tempo, key, energy, danceability directly from audio.
Each user's musical preferences are represented as a multi-dimensional vector, continuously updated via real-time Kafka streams of listening events. These profiles feed all ranking models with user-specific affinity scores.
Spotify's data platform handles 1+ trillion events/day (~70 TB compressed daily). After migrating from on-premise Hadoop to Google Cloud Platform in 2016, most pipelines are now written in Scio, Spotify's open-source Scala API for Apache Beam.
graph LR
subgraph Legacy["Era 1: On-Premise (Pre-2016)"]
KAFKA["Apache Kafka"]
HADOOP["Hadoop
(~2,500 nodes)"]
LUIGI["Luigi
(Python orchestrator)"]
HIVE["Apache Hive"]
CASS["Cassandra"]
end
subgraph Current["Era 2: Google Cloud (2016+)"]
PUBSUB["Cloud Pub/Sub"]
DATAFLOW["Cloud Dataflow
(Apache Beam)"]
BIGQ["BigQuery"]
BIGTABLE["Cloud Bigtable"]
GCSTORE["Cloud Storage"]
end
KAFKA -->|"replaced by"| PUBSUB
HADOOP -->|"replaced by"| DATAFLOW
HIVE -->|"replaced by"| BIGQ
CASS -->|"replaced by"| BIGTABLE
style KAFKA fill:#1A1530,stroke:#C0392B,color:#E8E0F0
style HADOOP fill:#1A1530,stroke:#C0392B,color:#E8E0F0
style PUBSUB fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style DATAFLOW fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style BIGQ fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
style BIGTABLE fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
| GCP Service | Replaces | Purpose |
|---|---|---|
| Cloud Pub/Sub | Apache Kafka | Event transport and queuing |
| Cloud Dataflow | Hadoop / Storm | Managed batch + streaming execution |
| BigQuery | Hive / HDFS | SQL analytics warehouse |
| Cloud Bigtable | Cassandra | High-speed key-value lookups |
| Cloud Storage | HDFS | Object / file storage |
| Cloud Spanner | PostgreSQL (some) | Transactional storage |
Scio provides a unified batch + streaming programming model with two core primitives: ParDo (parallel processing) and GroupByKey (shuffle). It has native connectors for all GCP services and powers most data pipelines at Spotify.
Wrapped 2019 was the largest Dataflow job ever run — 5x larger than 2018 at 75% of the cost. Wrapped 2020 introduced Sort Merge Bucket (SMB) joins to eliminate expensive shuffles, replacing Bigtable as an intermediate layer.
Spotify's backend is polyglot — primarily Java and Python, with Scala for data pipelines and growing adoption of Kotlin and Go. All services run on Kubernetes on Google Cloud Platform.
graph TD
subgraph Languages["Languages & Frameworks"]
JAVA["Java
(Apollo, Spring)"]
PY["Python
(Luigi, ML)"]
SCALA["Scala
(Scio pipelines)"]
KTGO["Kotlin / Go
(newer services)"]
end
subgraph Databases["Databases"]
CASS["Cassandra
(playlists, user data)"]
BT["Bigtable
(recommendations)"]
PG["PostgreSQL
(payments)"]
ES["Elasticsearch
(search index)"]
end
subgraph Infra["Infrastructure"]
K8S["Kubernetes
(GCP)"]
DOCKER["Docker"]
GRPC["gRPC +
Protobuf"]
PUBS["Cloud
Pub/Sub"]
end
subgraph DevTools["Developer Tools"]
BSTG["Backstage"]
APOLLO["Apollo
Libraries"]
end
Languages --> Infra
Languages --> Databases
DevTools --> Languages
style JAVA fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
style PY fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style SCALA fill:#C0392B,stroke:#C0392B,color:#E8E0F0
style K8S fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
style CASS fill:#E67E22,stroke:#E67E22,color:#E8E0F0
style ES fill:#F1C40F,stroke:#F1C40F,color:#0D0B14
style BSTG fill:#F1C40F,stroke:#F1C40F,color:#0D0B14
Spotify's open-source Java libraries for microservices: HTTP server, URI routing, middleware. Used in production for years.
Python-based workflow orchestrator born at Spotify. Manages complex pipeline dependencies and job scheduling.
Internal service discovery system. Services register on startup and become discoverable to receive traffic.
Spotify's podcast catalog grows by hundreds of thousands of episodes per day. Each episode passes through a DAG-driven pipeline of 6+ ML models for transcription, language detection, topic classification, and preview generation.
graph LR
subgraph Ingest["Ingestion"]
API["Central API
(new episodes)"]
DAG["DAG Router"]
end
subgraph ML["ML Enrichment (6+ models)"]
TRANS["Transcription"]
LANG["Language
Detection"]
SOUND["Sound Event
Detection"]
TOPIC["Topic
Classification"]
PREV["Preview
Generation"]
end
subgraph Infra["Infrastructure"]
KLIO["Klio Framework
(Apache Beam)"]
GPU["NVIDIA T4 GPUs
(16GB each)"]
DF["Cloud Dataflow
(autoscaling)"]
end
subgraph Delivery["Delivery"]
CDN["Multi-CDN
(Akamai, CloudFront)"]
ELIDX["Elasticsearch
(metadata index)"]
end
API --> DAG
DAG --> ML
ML --> KLIO
KLIO --> GPU
KLIO --> DF
ML --> CDN
ML --> ELIDX
style API fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
style DAG fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
style TRANS fill:#E67E22,stroke:#E67E22,color:#E8E0F0
style KLIO fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
style GPU fill:#C0392B,stroke:#C0392B,color:#E8E0F0
style CDN fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
Spotify's open-source audio processing framework built on Apache Beam. Switching from batch to streaming deployment reduced median preview generation latency from 111.7 minutes to 3.7 minutes — a 30x improvement.
| Component | Technology | Purpose |
|---|---|---|
| ML Frameworks | TensorFlow, PyTorch, Scikit-learn, Gensim | Ensemble of 6+ models per episode |
| GPU Hardware | NVIDIA T4 (16GB) | Model inference with fusion breaks for swapping |
| Orchestration | Cloud Dataflow | Dynamic autoscaling for batch + streaming |
| Packaging | Poetry + Docker | Dependency management and containerization |