5 Comments

Summary:

Pinterest has learned about scaling the way most popular sites do — the architecture works until one day it doesn’t. But in a talk at the Surge Conference two Pinterest engineers shared their wars stories. Here’s what they learned about keeping it simple and database sharding.

What happens when a site goes viral and more than doubles it user base every month? It breaks of course. Here’s how popular photo-based social network Pinterest, handled the problem, and a few tips from Pinterest engineers on how others might avoid the same sort of trouble.

Marty Weiner and Yashh Nelapati of Pinterest shared the lessons from that experience on Thursday at the Surge Conference in Baltimore, Md., with most of the tips being about how the site has scaled its MySQL database. It’s a presentation the guys have given before, and slides can be found here. But for those who want the big picture in a few bullet points, here ya go. (And if you’re interested in the design of Pinterest, CEO Ben Silbermann will be speaking at our RoadMap conference).

Simply put, learned quickly that too much complexity was its enemy if it wanted its infrastructure to scale as fast as the site was growing. Pinterest began in March 2010 hosted on Rackspace using one MySQL database and one small web engine. Once it launched in January 2011, it had migrated to Amazon’s Ec2, a few more MySQL databases, a few Nginx web servers, MongoDB and TaskQueue. As it transitioned to its big-growth stage, it began running more and more tools, including Memcached, Redis and at least three other tools.

So the first lesson Weiner shared was not to do that: Instead of running a bunch of tools, simplify. The tools he decided to focus on shared the following characteristics: they were free; they had a large and happy user base; and they all had good or really good performance. Those tools were Amazon, Memcached, Redis and MySQL. Granted, getting them to scale properly wasn’t an engineering-free task, but at least everything was manageable when the work was done.

One of the tougher choices Pinterest had to make was a decision between clustering and sharding. Weiner described a continuum between the two, portraying clustering as automatic distribution of data through tools like Cassandra, HBase and Membase, and sharding as a manual act of deciding where to put data on a machine-by-machine basis. Given his choice — sharding — he’s clearly a fan of control for his database technologies.

He complained that while the automatic distribution of data across servers was cool and easy, it also came with a big point of failure. Because the cluster management software that ran on his databases and handled how the database scaled across multiple servers, bugs and errors in that code would be automatically replicated across the entire cluster.

The rest of his talk focused on how to shard a lot of data and keep growing. For those who are interested in a deep dive on that technology, check out a video of the same talk he gave in May. For most of those who are thinking about building new scalable apps, the key lessons are probably that when building a web site that you hope to scale, you should keep it simple, go for popular and well-liked tools that are free, and seriously consider the tradeoffs between control and ease of use.

And for those of you who just like pinning photos to Pinterest, you can sleep well with the knowledge that thanks to the way the site sharded its database, all of your pins likely reside on the same server right next to your user ID. And that’s a good thing, because it makes it much easier to then scale the service to more users, without everything breaking down.

You’re subscribed! If you like, you can update your settings

  1. I find it amazing that sharding, or in other words the idea of “scale out by splitting and parallelizing data across shared-nothing commodity-hardware” is not supplied “out of the box” by the infrastructure (such as database). It’s like the database has outsourced it to the application…

    ScaleBase (http://www.scalebase.com) (disclaimer: I work there) is a maker of a complete scale-out solution an “automatic scale-out machine” if you like. I think Pinterest story is great, and great outcome, but it’s not always the case with this complex matter, and a generic, repeatable, IT-level solution for Scale Out can make it much easier for all other “Pinterests” out there to make the right choice and enjoy the great benefits – without the tremendous efforts and labor in home-grown sharding.

  2. Epic epic epic. Love reading these scale stories. Thanks.

  3. This sounds like a case where Windows Azure’s SQL federations could scale infinitely. It auto shards a database by certain keys. We decided to manually shard our database, but in the right circumstance, perhaps something as simple as Pinterest, SQL federations may solve it. http://msdn.microsoft.com/en-us/library/windowsazure/hh597452.aspx

Comments have been disabled for this post