In the name of scale

Netflix is revamping its data architecture for streaming movies

Netflix is revamping the computing architecture that processes data for its streaming video service, according to a Netflix blog post that came out on Tuesday.

The [company]Netflix[/company] engineering team wanted an architecture that can handle three key areas the video-streaming giant believes greatly affects the user experience: knowing what titles a person has watched; knowing where in a given title did a person stop watching; and knowing what else is being watched on someone’s account, which is helpful for family members who may be sharing one account.

Although Netflix’s current architecture allows the company to handle all of these tasks, and the company built a distributed stateful system (meaning that the system keeps track of all user interaction and video watching and can react to any of those changes on the fly) to handle the activity, Netflix “ended up with a complex solution that was less robust than mature open source technologies” and wants something that’s more scalable.

Netflix’s current architecture looks like this:

Netflix architecture figure
Netflix architecture figure

There’s a viewing service that’s split up into a stateful tier that stores the data for active views in memory; Cassandra is used as the primary data store with the Memcached key-value store built on top for data caching. There’s also a stateless tier that acts as “a fallback mechanism when a stateful node was unreachable.”

This basically means that when an outage occurs, the data stored in the stateless tier can transfer over to the end user, even though that data may not be exactly as up-to-date or as relevant as the data held in the stateful tier.

In regard to caching, the Netflix team apparently finds Memcached helpful for the time being, but is looking for a different technology “that natively supports first class data types and operations like append.”
From the blog post:
[blockquote person=”Netflix” attribution=”Netflix”]Memcached offers superb throughput and latency characteristics, but isn’t well suited for our use case. To update the data in memcached, we read the latest data, append a new view entry (if none exists for that movie) or modify an existing entry (moving it to the front of the time-ordered list), and then write the updated data back to memcached. We use an eventually consistent approach to handling multiple writers, accepting that an inconsistent write may happen but will get corrected soon after due to a short cache entry TTL and a periodic cache refresh.[/blockquote]

Things got a bit more complex from an architecture perspective when Netflix “moved from a single AWS region to running in multiple AWS regions” as the team had “to build a custom mechanism to communicate the state between stateful tiers in different regions,” which obviously means having to keep track of a lot more moving parts.

For Netflix’s upcoming architecture overhaul, the company is looking at a design that accommodates these three principles: availability over consistency; microservices; and polyglot persistence, which means having multiple data storage technologies to be used for different, specific purposes.

The new system will look something like this:

Netflix future architecture
Netflix future architecture

There’s not a lot of information as to what exactly will be the technologies that comprise this new architecture, but Netflix said it will be following up in a future post with more details. Judging by the picture of the eye in the diagram, it looks like Cassandra will still be one of them.

2 Responses to “Netflix is revamping its data architecture for streaming movies”

  1. exhibit44

    Universal real time feedback on the preferences and viewing habits of every customer, sorted by demographics. The studio titan of old never dreamed of such a godlike power.

  2. Before I dropped streaming services, I had a not-rare-enough issue with Netflix where it would forget I watched 20-30 minutes of something. It doesn’t look like they want to solve that issue either so I’ll wait longer.