When It Comes to Social Networks, Infrastructure Wins

Facebook is now playing host to half a billion people. The site is the central point of many users daily lives. It is their newspaper. It is their photo site. It is their online social reality. And one of the main (and underappreciated) reasons it has been able to get there is its infrastructure.

The success of Facebook and its ability to handle 500 million users shows that even in the people centric social web, what ultimately matters is the ability to scale and the infrastructure to support that scale. Just as Google has used its infrastructure to its advantage, offering faster and speedier results to search queries, Facebook has outwitted and outlasted its competitors’ infrastructure challenges.

It is no small feat, by any yardstick. In a blog post outlining its growth, Robert Johnson, a Facebook director of engineering, notes that the service has:

  • 500 million active users
  • 100 billion hits per day
  • 50 billion photos
  • 2 trillion objects cached, with hundreds of millions of requests per second
  • 130 terabytes of logs every day

If you look at the graph of our growth you’ll notice that there’s no point where it’s flat. We never get to sit back and take a deep breath, pat ourselves on the back, and think about what we might do next time. Every week we have our biggest day ever. We of course have a pretty good idea of where the graph is headed, but at every level of scale there are surprises. The best way we have to deal with these surprises is to have engineering and operations teams that are flexible, and can deal with problems quickly.

A flexible architecture has allowed Facebook to scale with its audience. According to web analytics and performance measurement service AlertSite, between April 1 and June 30, 2010, an average response time for Facebook was about 1.02 seconds, nearly a fourth that of Twitter’s response time. Twitter was the worst amongst all social networks in terms of availability. From the AlertSite Blog:

During Q2 we witnessed a worldwide Internet event — the World Cup — which began on June 11 and carried through into the current month. However, the site was ill-equipped for the volume of traffic it would receive. Twitter’s experience demonstrates the effect worldwide events such as the World Cup can have on a website, particularly when it has not prepared in advance. As demand for real-time information increases, consumer expectations for the time it should take a website to load follow suit. The performance of social sites must scale to meet these demands.

These performance issues have caused a lot of heartburn amongst Twitter’s developer and partner community. In an IDG News report published earlier today, Seesmic CEO Loic Le Meur put it bluntly:

We are generally used to the service going down without any warning and never surprised. We’re more surprised when it’s up for weeks without problems.

That is not a good reputation for any service to have. In many ways, putting an end to unscalable infrastructure and unreliable service is what will prevent Twitter from becoming Friendster, an early social network that lost all its momentum because of its pokey infrastructure. (Twitter is addressing the problems and is launching its own data center, as reported yesterday.)

The social web is very complex. Data on social networks is dynamic, constantly growing and always changing. And the problem is only going to get more and more difficult as the amount of activity on social networks increases exponentially with every new user.

Facebook’s VP of Technical Operations, Jonathan Heiliger, when speaking at our Structure 2010 conference put it best: “You can never think about scale too early.” Especially when it comes to social web services.

Related GigaOM Pro research (sub req’d):