Spotify architecture - Pressing play

Pressing play Niklas Gustavsson ngn@spotify.com @protocol7 Tuesday, April 17, 12

Who am I? • ngn@spotify.com • @protocol7 • Spotify backend dev based in Göteborg • Mainly from a JVM background, working on various stuff over the years • Apache Software Foundation member Tuesday, April 17, 12

What’s Spotify allabout? • A big catalogue, tons of music • Available everywhere • Great user experience • More convenient than piracy • Fast, reliable, always available • Scalable for many, many users • Ad-supported or payed-for service Tuesday, April 17, 12

Where’s Spotify? • Let’s start the client, but where should it connect to? Tuesday, April 17, 12

Aside: SRV records • Example SRV _spotify-mac-client._tcp.spotify.com. 242 IN SRV 10 8 4070 C8.spotify.com. _spotify-mac-client._tcp.spotify.com. 242 IN SRV 10 16 4070 C4.spotify.com. name TTL class prio weight port host • GeoDNS used Tuesday, April 17, 12

What does thatrecord really point to? • accesspoint • Handles authentication state, logging, routing, rate limiting and much more • Protocol between client and AP uses a single, encrypted multiplexed socket over TCP • Written in C++ Tuesday, April 17, 12

10.

Services • Probably close to 100 backend services, most small, handling a single task • UNIX philosophy • Many autonomous • Deployed on commodity servers • Always redundant Tuesday, April 17, 12

11.

Services • Mostly written in Python, a few in Java and C • Storage optimized for each service, mostly PostgreSQL, Cassandra and Tokyo Cabinet • Many service uses in-memory caching using for example /dev/shm or memcached • Usually a small daemon, talking HTTP or Hermes • Got our own supervisor which keeps services running Tuesday, April 17, 12

12.

Aside: Hermes • ZeroMQ for transport, protobuf for envelope and payload • HTTP-like verbs and caching • Request-reply and publish/subscribe • Very performant and introspectable Tuesday, April 17, 12

13.

How does theaccesspoint ﬁnd search? • Everything has an SRV DNS record: • One record with same name for each service instance • Clients resolve to ﬁnd servers providing that service • Lowest priority record is chosen with weighted shuffle • Clients retry other instances in case of failures Tuesday, April 17, 12

14.

Read-only services • Stateless • Writes are hard • Simple to scale, just add more servers • Services can be restarted as needed • Indexes prefabricated, distributed to live servers Tuesday, April 17, 12

15.

Read-write services • User generated content, e.g. playlists • Hard to ensure consistence of data across instances Solutions: • Eventual consistency: • Reads of just written data not guaranteed to be up-to-date • Locking, atomic operations • Creating globally unique keys, e.g. usernames • Transactions, e.g. billing Tuesday, April 17, 12

16.

Sharding • Some services use Dynamo inspired DHTs • Each request has a key • Each service node is responsible for a range of hash keys • Data is distributed among service nodes • Redundancy is ensured by writing to replica node • Data must be transitioned when ring changes Tuesday, April 17, 12

17.

18.

search • Java service • Lucene storage • New index published daily • Doesn’t store any metadata in itself, returns a list of identiﬁers • (Search suggestions are served from a separate service, optimized for speed) Tuesday, April 17, 12

19.

Metadata services • Multiple read-only services • 60 Gb indices • Responds to metadata requests • Decorates metadata onto other service responses • We’re most likely moving away from this model Tuesday, April 17, 12

20.

21.

Another aside: Howdoes stuff get into Spotify? • >15 million tracks, we can’t maintain all that ourselves • Ingest audio, images and metadata from labels • Receive, transform, transcode, merge • All ends up in a metadata database from which indices are generated and distributed to services Tuesday, April 17, 12

22.

23.

The Kent bug • Much of the metadata lacks identiﬁers which leaves us with heuristics. Tuesday, April 17, 12

24.

25.

Audio encodings andﬁles • Spotify supports multiple audio encodings • Ogg Vorbis 96 (-q2), 160 (-q5) and 320 000 (- q9) • MP3 320 000 (downloads) • For each track, a ﬁle for each encoding/bitrate is listed in the returned metadata • The client picks an appropriate choice Tuesday, April 17, 12

26.

Get the audiodata • The client now must fetch the actual audio data • Latency kills Tuesday, April 17, 12

27.

Cache • Player caches tracks it has played • Caches are large (56% are over 5 GB) • Least Recently Used policy for cache eviction • 50% of data comes from local cache • Cached ﬁles are served in P2P overlay Tuesday, April 17, 12

28.

Streaming • Request ﬁrst piece from Spotify storage • Meanwhile, search peer-to-peer (P2P) for remainder • Switch back and forth between Spotify storage and peers as needed • Towards end of a track, start prefetching next one Tuesday, April 17, 12

29.

P2P • All peers are equals (no supernodes) • A user only downloads data she needs • tracker service keeps peers for each track • P2P network becomes (weakly) clustered by interest • Oblivious to network architecture • Does not enforce fairness • Mobile clients does not participate in P2P h.p://www.csc.kth.se/~gkreitz/spo9fy/kreitz-‐spo9fy_kth11.pdf Tuesday, April 17, 12

30.

31.

32.

33.

YAA: Hadoop • We run analysis using Hadoop which feeds back into the previously described process, e.g. track popularity is used for weighing search results and toplists Tuesday, April 17, 12

34.

35.

Development at Spotify • Uses almost exclusively open source software • Git, Debian, Munin, Zabbix, Puppet, Teamcity... • Developers use whatever development tools they are comfortable with • Scrum or Kanban in three week iterations • DevOps heavy. Freaking awesome ops • Monitor and measure all the things! Tuesday, April 17, 12

36.

Development at Spotify • Development hubs in Stockholm, Göteborg and NYC • All in all, >220 people in tech • Very talented team • Hackdays and system owner days in each iteration • Hangs out on IRC • Growing and hiring Tuesday, April 17, 12

37.

38.

39.

Thank you Want to work at Spotify? http://www.spotify.com/jobs/ Tuesday, April 17, 12