Spotify Architecture Map — Architecture Guide

01

Platform Overview

Spotify's architecture is built on event-driven microservices running on Google Cloud Platform. Over 90 teams and 600+ developers manage 810+ active services, processing more than 1 trillion events per day across 1,800+ distinct event types.

810+

Microservices

1T+

Events / Day

1,800+

Event Types

600+

Developers

200M+

Monthly Users

High-Level Platform Architecture

graph TD
    subgraph Clients["Client Layer"]
        MOB["Mobile Apps
(iOS / Android)"]
        WEB["Web Player"]
        DSK["Desktop App"]
    end

    subgraph Edge["Edge & Delivery"]
        CDN["Multi-CDN
(Akamai, CloudFront, Fastly)"]
        API["API Gateway"]
    end

    subgraph Services["Microservices (810+)"]
        STREAM["Audio
Streaming"]
        SEARCH["Search"]
        REC["Recommendations"]
        USER["User &
Auth"]
        PAY["Payments"]
        POD["Podcast
Pipeline"]
    end

    subgraph Platform["Platform Layer"]
        K8S["Kubernetes
(GCP)"]
        EDI["Event Delivery
Infrastructure"]
        BACK["Backstage
Dev Portal"]
    end

    Clients --> Edge
    Edge --> Services
    Services --> Platform

    style MOB fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
    style CDN fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style API fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style REC fill:#E67E22,stroke:#E67E22,color:#E8E0F0
    style K8S fill:#1A1530,stroke:#5DADE2,color:#E8E0F0
    style EDI fill:#1A1530,stroke:#8B44AC,color:#E8E0F0
    style BACK fill:#1A1530,stroke:#F1C40F,color:#E8E0F0

02

Event-Driven Microservices

Each microservice owns its own database and handles a single domain concern. Services communicate via gRPC with Protobuf for synchronous calls and Google Cloud Pub/Sub for asynchronous event processing.

Service Communication Patterns

graph LR
    subgraph Sync["Synchronous (gRPC)"]
        SA["Service A"] -->|"gRPC / Protobuf"| SB["Service B"]
        SB -->|"gRPC / Protobuf"| SC["Service C"]
    end

    subgraph Async["Asynchronous (Events)"]
        SD["Service D"] -->|"publish"| PS["Cloud Pub/Sub
Topic"]
        PS -->|"subscribe"| SE["Service E"]
        PS -->|"subscribe"| SF["Service F"]
    end

    subgraph Discovery["Service Discovery"]
        NL["Nameless
(registry)"]
    end

    SA -.->|"register"| NL
    SB -.->|"register"| NL

    style SA fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style SB fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style SC fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style SD fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
    style PS fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style SE fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
    style SF fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
    style NL fill:#E67E22,stroke:#E67E22,color:#E8E0F0

Event Delivery Infrastructure (EDI)

Originally built on Kafka 0.7, Spotify migrated to Google Cloud Pub/Sub in 2016-2017. The Event Service parses and routes events to per-type Pub/Sub topics. When teams define event schemas, Kubernetes operators automatically deploy queues, anonymization pipelines, and streaming jobs.

Event Delivery Pipeline

graph LR
    subgraph Producers["Event Producers"]
        APP["Client Apps"]
        SVC["Backend
Services"]
    end

    subgraph EDI["Event Delivery Infrastructure"]
        ES["Event Service
(parser + router)"]
        SCHEMA["Schema
Registry"]
    end

    subgraph Transport["Transport"]
        PUBSUB["Cloud Pub/Sub
(per-type topics)"]
    end

    subgraph Processing["Processing"]
        ANON["Anonymization
Pipeline"]
        ETL["ETL /
Dataflow Jobs"]
    end

    subgraph Storage["Storage"]
        BQ["BigQuery"]
        GCS["Cloud Storage"]
    end

    APP --> ES
    SVC --> ES
    ES --> SCHEMA
    ES --> PUBSUB
    PUBSUB --> ANON
    ANON --> ETL
    ETL --> BQ
    ETL --> GCS

    style ES fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
    style PUBSUB fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style BQ fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style GCS fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style ANON fill:#C0392B,stroke:#C0392B,color:#E8E0F0

Scale

The Event Delivery Infrastructure processes 1+ trillion events per day (~70 TB compressed daily) across 1,800+ distinct event types. The largest single service handles ~10 million requests/second.

03

Squad / Tribe / Chapter / Guild

Introduced in 2011 by Henrik Kniberg and Anders Ivarsson, Spotify's organizational model maps directly to how microservices are owned and operated. Each squad functions as a mini-startup with full autonomy — a textbook case of Conway's Law.

Organizational Matrix

graph TD
    subgraph Tribe1["Tribe: Mobile Player (<100 people)"]
        SQ1["Squad: Playback
(6-12 people)"]
        SQ2["Squad: Queue
Management"]
        SQ3["Squad: Offline
Mode"]
    end

    subgraph Tribe2["Tribe: Content Platform"]
        SQ4["Squad: Search
Experience"]
        SQ5["Squad: Catalog
Ingestion"]
    end

    subgraph Chapters["Chapters (within tribe)"]
        CH1["Backend
Engineers"]
        CH2["iOS
Engineers"]
    end

    subgraph Guilds["Guilds (cross-tribe)"]
        G1["AI Guild"]
        G2["DevOps Guild"]
    end

    SQ1 -.-> CH1
    SQ2 -.-> CH1
    SQ3 -.-> CH2
    SQ4 -.-> CH1
    SQ1 -.-> G1
    SQ4 -.-> G1
    SQ5 -.-> G2

    style SQ1 fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style SQ2 fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style SQ3 fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style SQ4 fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
    style SQ5 fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
    style CH1 fill:#E67E22,stroke:#E67E22,color:#E8E0F0
    style CH2 fill:#E67E22,stroke:#E67E22,color:#E8E0F0
    style G1 fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style G2 fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0

Squads

6-12 people, cross-functional, full autonomy. Owns one or more microservices end-to-end with a long-term mission.

Core Unit

Tribes

Collection of squads grouped by business area, capped at ~100 people (Dunbar's number) with a Tribe Lead.

Business Area

Chapters

Same-skill engineers within a tribe. Led by a Chapter Lead who serves as line manager.

Skill Group

Guilds

Informal, cross-tribe communities of interest. Volunteer-led, no formal hierarchy.

Community

04

Backstage Developer Portal

Built internally to manage 800+ microservices across 500+ engineering teams. Open-sourced in 2020, now a CNCF incubating project. Three-layer architecture: React frontend, Node.js backend, PostgreSQL database.

Backstage Architecture

graph TD
    subgraph Frontend["Frontend (React)"]
        UI["Plugin-based UI
(extension tree)"]
    end

    subgraph Backend["Backend (Node.js)"]
        CAT["Software
Catalog"]
        SCAFF["Software
Templates"]
        TDOC["TechDocs
(Markdown)"]
        K8P["Kubernetes
Plugin"]
        SRCH["Search"]
    end

    subgraph Data["Database Layer"]
        PG["PostgreSQL
(production)"]
        SQLITE["SQLite
(development)"]
    end

    subgraph External["External Integrations"]
        GH["GitHub /
GitLab"]
        K8S["Kubernetes
Clusters"]
        CACHE["Redis /
Memcache"]
    end

    UI --> Backend
    Backend --> PG
    Backend --> SQLITE
    CAT --> GH
    SCAFF --> GH
    K8P --> K8S
    Backend --> CACHE

    style UI fill:#F1C40F,stroke:#F1C40F,color:#0D0B14
    style CAT fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
    style SCAFF fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
    style TDOC fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
    style PG fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style GH fill:#1A1530,stroke:#5DADE2,color:#E8E0F0

Core Plugins

Plugin	Purpose	Type
Software Catalog	Central metadata repository for all services, APIs, libraries, and teams (YAML descriptors)	Core
Software Templates	Standardized project scaffolding — code skeletons, variable injection, VCS publish	Core
TechDocs	Docs-like-code: Markdown alongside code, rendered and searchable inside Backstage	Core
Kubernetes	View pod status, deployments, and logs for cataloged services	Plugin
Search	Unified search across catalog entities and documentation	Core

Commercial Offering

Spotify Portal for Backstage adds no-code setup, service maturity scoring, incident management integration, and advanced analytics on top of the open-source platform.

05

Audio Streaming & CDN

Spotify serves 50+ million tracks plus images and assets to 200+ million monthly active users through a multi-CDN strategy with adaptive bitrate streaming.

Content Delivery Architecture

graph LR
    subgraph Origin["Origin Storage"]
        S3["AWS S3"]
        GCS["Google Cloud
Storage"]
    end

    subgraph CDN["Multi-CDN Layer"]
        AK["Akamai
(audio primary)"]
        CF["AWS CloudFront
(audio secondary)"]
        FAST["Fastly
(metadata, images)"]
    end

    subgraph Quality["Audio Formats"]
        OGG["Ogg Vorbis
(primary)"]
        AAC["AAC"]
    end

    subgraph Bitrate["Adaptive Bitrate"]
        B96["96 kbps"]
        B160["160 kbps"]
        B320["320 kbps"]
    end

    subgraph Client["Clients"]
        PLAY["Player"]
    end

    Origin --> CDN
    OGG --> B96
    OGG --> B160
    OGG --> B320
    CDN --> PLAY
    Bitrate -.-> CDN

    style S3 fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style GCS fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style AK fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style CF fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style FAST fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style PLAY fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0

Akamai + CloudFront

Business-critical audio streaming with low latency and high bandwidth. Primary CDN tier for music playback.

Streaming

Fastly (VCL Edge Logic)

Images, client updates, metadata. Uses Varnish Configuration Language for intelligent caching.

Edge

SquadCDN

Internal self-service CDN configuration tool combining Fastly APIs with VCL for team-managed rules.

DevEx

06

Personalization & Recommendations

Spotify's recommendation system powers Discover Weekly, Release Radar, Daily Mixes, and Wrapped. It fuses three core techniques: collaborative filtering, NLP-based content analysis, and deep learning on raw audio.

Recommendation Pipeline

graph TD
    subgraph Signals["Input Signals"]
        LISTEN["Listening
History"]
        PLAYLISTS["Playlist
Co-occurrence"]
        SOCIAL["Web Crawl
(blogs, reviews)"]
        AUDIO["Raw Audio
Spectrograms"]
    end

    subgraph Models["ML Models"]
        CF["Collaborative
Filtering"]
        NLP["NLP Content
Analysis"]
        CNN["CNN Audio
Analysis"]
    end

    subgraph Ranking["Ranking Layer"]
        RANK["ML Ranking
(GBDT / Neural)"]
        CONTEXT["Context Features
(time, device)"]
    end

    subgraph Output["Output"]
        DW["Discover
Weekly"]
        RR["Release
Radar"]
        DM["Daily
Mixes"]
    end

    LISTEN --> CF
    PLAYLISTS --> CF
    SOCIAL --> NLP
    AUDIO --> CNN
    CF --> RANK
    NLP --> RANK
    CNN --> RANK
    CONTEXT --> RANK
    RANK --> Output

    style CF fill:#E67E22,stroke:#E67E22,color:#E8E0F0
    style NLP fill:#E67E22,stroke:#E67E22,color:#E8E0F0
    style CNN fill:#E67E22,stroke:#E67E22,color:#E8E0F0
    style RANK fill:#C0392B,stroke:#C0392B,color:#E8E0F0
    style DW fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style RR fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style DM fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0

Three Pillars of Recommendation

Collaborative Filtering

Analyzes billions of user-created playlists for co-occurrence patterns. Matrix factorization at massive scale.

Playlist data

Content-Based NLP

Powered by The Echo Nest (acquired 2014). Crawls music blogs and reviews to build "cultural vectors" for artists.

Web crawl

CNN Audio Analysis

4 convolutional + 3 fully-connected layers analyze spectrograms. Extracts tempo, key, energy, danceability directly from audio.

Cold start

Taste Profiles

Each user's musical preferences are represented as a multi-dimensional vector, continuously updated via real-time Kafka streams of listening events. These profiles feed all ranking models with user-specific affinity scores.

07

Data Infrastructure

Spotify's data platform handles 1+ trillion events/day (~70 TB compressed daily). After migrating from on-premise Hadoop to Google Cloud Platform in 2016, most pipelines are now written in Scio, Spotify's open-source Scala API for Apache Beam.

Data Platform Evolution (On-Prem to GCP)

graph LR
    subgraph Legacy["Era 1: On-Premise (Pre-2016)"]
        KAFKA["Apache Kafka"]
        HADOOP["Hadoop
(~2,500 nodes)"]
        LUIGI["Luigi
(Python orchestrator)"]
        HIVE["Apache Hive"]
        CASS["Cassandra"]
    end

    subgraph Current["Era 2: Google Cloud (2016+)"]
        PUBSUB["Cloud Pub/Sub"]
        DATAFLOW["Cloud Dataflow
(Apache Beam)"]
        BIGQ["BigQuery"]
        BIGTABLE["Cloud Bigtable"]
        GCSTORE["Cloud Storage"]
    end

    KAFKA -->|"replaced by"| PUBSUB
    HADOOP -->|"replaced by"| DATAFLOW
    HIVE -->|"replaced by"| BIGQ
    CASS -->|"replaced by"| BIGTABLE

    style KAFKA fill:#1A1530,stroke:#C0392B,color:#E8E0F0
    style HADOOP fill:#1A1530,stroke:#C0392B,color:#E8E0F0
    style PUBSUB fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style DATAFLOW fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style BIGQ fill:#8B44AC,stroke:#8B44AC,color:#E8E0F0
    style BIGTABLE fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0

GCP Component Mapping

GCP Service	Replaces	Purpose
Cloud Pub/Sub	Apache Kafka	Event transport and queuing
Cloud Dataflow	Hadoop / Storm	Managed batch + streaming execution
BigQuery	Hive / HDFS	SQL analytics warehouse
Cloud Bigtable	Cassandra	High-speed key-value lookups
Cloud Storage	HDFS	Object / file storage
Cloud Spanner	PostgreSQL (some)	Transactional storage

Scio — Spotify's Scala API for Apache Beam

Scio provides a unified batch + streaming programming model with two core primitives: ParDo (parallel processing) and GroupByKey (shuffle). It has native connectors for all GCP services and powers most data pipelines at Spotify.

Wrapped Pipeline

Wrapped 2019 was the largest Dataflow job ever run — 5x larger than 2018 at 75% of the cost. Wrapped 2020 introduced Sort Merge Bucket (SMB) joins to eliminate expensive shuffles, replacing Bigtable as an intermediate layer.

08

Backend Services & Technology Stack

Spotify's backend is polyglot — primarily Java and Python, with Scala for data pipelines and growing adoption of Kotlin and Go. All services run on Kubernetes on Google Cloud Platform.

Technology Stack

graph TD
    subgraph Languages["Languages & Frameworks"]
        JAVA["Java
(Apollo, Spring)"]
        PY["Python
(Luigi, ML)"]
        SCALA["Scala
(Scio pipelines)"]
        KTGO["Kotlin / Go
(newer services)"]
    end

    subgraph Databases["Databases"]
        CASS["Cassandra
(playlists, user data)"]
        BT["Bigtable
(recommendations)"]
        PG["PostgreSQL
(payments)"]
        ES["Elasticsearch
(search index)"]
    end

    subgraph Infra["Infrastructure"]
        K8S["Kubernetes
(GCP)"]
        DOCKER["Docker"]
        GRPC["gRPC +
Protobuf"]
        PUBS["Cloud
Pub/Sub"]
    end

    subgraph DevTools["Developer Tools"]
        BSTG["Backstage"]
        APOLLO["Apollo
Libraries"]
    end

    Languages --> Infra
    Languages --> Databases
    DevTools --> Languages

    style JAVA fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0
    style PY fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style SCALA fill:#C0392B,stroke:#C0392B,color:#E8E0F0
    style K8S fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
    style CASS fill:#E67E22,stroke:#E67E22,color:#E8E0F0
    style ES fill:#F1C40F,stroke:#F1C40F,color:#0D0B14
    style BSTG fill:#F1C40F,stroke:#F1C40F,color:#0D0B14

Apollo Libraries

Spotify's open-source Java libraries for microservices: HTTP server, URI routing, middleware. Used in production for years.

Java Open source

Luigi

Python-based workflow orchestrator born at Spotify. Manages complex pipeline dependencies and job scheduling.

Python Open source

Nameless

Internal service discovery system. Services register on startup and become discoverable to receive traffic.

Infra Internal

09

Podcast Ingestion & Delivery

Spotify's podcast catalog grows by hundreds of thousands of episodes per day. Each episode passes through a DAG-driven pipeline of 6+ ML models for transcription, language detection, topic classification, and preview generation.

Podcast Processing Pipeline

graph LR
    subgraph Ingest["Ingestion"]
        API["Central API
(new episodes)"]
        DAG["DAG Router"]
    end

    subgraph ML["ML Enrichment (6+ models)"]
        TRANS["Transcription"]
        LANG["Language
Detection"]
        SOUND["Sound Event
Detection"]
        TOPIC["Topic
Classification"]
        PREV["Preview
Generation"]
    end

    subgraph Infra["Infrastructure"]
        KLIO["Klio Framework
(Apache Beam)"]
        GPU["NVIDIA T4 GPUs
(16GB each)"]
        DF["Cloud Dataflow
(autoscaling)"]
    end

    subgraph Delivery["Delivery"]
        CDN["Multi-CDN
(Akamai, CloudFront)"]
        ELIDX["Elasticsearch
(metadata index)"]
    end

    API --> DAG
    DAG --> ML
    ML --> KLIO
    KLIO --> GPU
    KLIO --> DF
    ML --> CDN
    ML --> ELIDX

    style API fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
    style DAG fill:#4B3F8C,stroke:#8B44AC,color:#E8E0F0
    style TRANS fill:#E67E22,stroke:#E67E22,color:#E8E0F0
    style KLIO fill:#1A9E6F,stroke:#1A9E6F,color:#E8E0F0
    style GPU fill:#C0392B,stroke:#C0392B,color:#E8E0F0
    style CDN fill:#1A4FA0,stroke:#1A4FA0,color:#E8E0F0

Klio Framework

Spotify's open-source audio processing framework built on Apache Beam. Switching from batch to streaming deployment reduced median preview generation latency from 111.7 minutes to 3.7 minutes — a 30x improvement.

ML Processing Stack

Component	Technology	Purpose
ML Frameworks	TensorFlow, PyTorch, Scikit-learn, Gensim	Ensemble of 6+ models per episode
GPU Hardware	NVIDIA T4 (16GB)	Model inference with fusion breaks for swapping
Orchestration	Cloud Dataflow	Dynamic autoscaling for batch + streaming
Packaging	Poetry + Docker	Dependency management and containerization

10

Acronym Reference

AACAdvanced Audio Coding

CDNContent Delivery Network

CNCFCloud Native Computing Foundation

CNNConvolutional Neural Network

DAGDirected Acyclic Graph

EDIEvent Delivery Infrastructure

ETLExtract, Transform, Load

GBDTGradient Boosted Decision Trees

GCPGoogle Cloud Platform

GCSGoogle Cloud Storage

gRPCGoogle Remote Procedure Call

HDFSHadoop Distributed File System

IKIntegration Control (Kafka context)

K8sKubernetes

MLMachine Learning

NLPNatural Language Processing

SMBSort Merge Bucket

VCLVarnish Configuration Language