How Facebook Squeezes More From Its Machines

Back in June, during an onstage conversation with me at our Structure 09 conference, Facebook VP of Technical Operations Jonathan Heiliger lamented how chip makers such as Intel and AMD don’t quite understand the needs of web behemoths like his company, instead touting benchmarks and metrics that are far removed from reality.

Industry-standard benchmarks, such as those published by The Standard Performance Evaluation Corporation (SPEC) can be reasonable indicators of maximum throughput for certain workloads.  At Facebook, we recognized these benchmarks wouldn’t necessarily represent our application behavior under real-world conditions and developed a proprietary analysis methodology.

Frustration with those specifications is what led to the building of Facebook’s capacity testing tool, Dyno, which the company has been using since July. Yesterday, Jonathan and two of his colleagues, Marco Baray and Jason Taylor, shared some details as to how, exactly, Facebook benchmark’s server performance — and how that’s helped them squeeze the most out of their machines. The findings are shared in a white paper entitled “Real-World Web Application Benchmarking” (embedded at the end of this post.) Taylor spearheaded the project for Facebook. [digg=]

“We wanted to get rid of the ad-hoc nature of server performance measurement and deployment,” said Heiliger of Dyno. Named after a Dynamometer, a device that measures force or power, typically in automobiles, Dyno does the same for Facebook servers. “Effectively we are doing the same for the servers where we are focused on throughput and server capacity,” says Heiliger. “When you get to a certain scale, say, 100 servers, you need to have something like Dyno.” It also allows the company to constantly optimize its software stack to derive the most out of its hardware. “It allows us to more effectively measure the performance of our server infrastructure and then derive the most out of it,” said Heiliger. From the white paper:

Anecdotally, when Facebook switched from an FB-DIMM platform to the Intel San Clemente platform, utilizing DDR2 memory, we observed an unexpected increase in throughput.  This performance boost initiated an investigation that found the web application to be memory- and CPU-bound.  The decreased latency of the DDR2 architecture provided a significant increase in web node throughput.

Baray explained that as Facebook adds more features to its service, it becomes more complex. “The web site becomes heavier, so we need to constantly adapt our capacity and figure out how we manage it smartly,” he said. In order to do that, the company needs to constantly monitor its data as effectively as possible. And that’s where Dyno comes in handy.

For example, the company recently added new servers that were based on Intel’s NehlamNehalem/Tylersburg chip architecture — which delivered a markedly superior performance over its existing Harpertown-based servers. “There was an over 40 percent difference, which is huge when you have thousands of servers,” said Heiliger.

Knowing which servers can handle more loads and provide more throughput allows Facebook to dynamically shift traffic loads around in order to achieve the top performance. Facebook has more than 30,000 servers, according to some estimates. The company adds roughly 10,000 new ones every 18 months. The company can’t afford to not squeeze the most out of its machines.

Real World Bench Marking v10