2 Comments

Summary:

Facebook engineers have tested a 64-core chip from Tilera and found it ideal for grabbing data quickly from key value stores. This may galvanize the creation of new benchmarks as the debate of which architecture works best for webscale and cloud computing rages.

tilerafb

Facebook engineers have tested a 64-core specialty chip from Tilera and found it more efficient for grabbing data quickly from key value stores. This test and others performed across the industry on alternatives to x86 chips may galvanize the creation of new benchmarks for the server industry as the debate of which architecture works best for webscale and cloud computing rages.

The paper, issued Monday, and written by three Facebook engineers and one Tilera engineer, is called “Many-Core Key Value Store.” The goal was to test Facebook’s memcached structure using Intel 4-core Xeon and AMD 8-core processors — the de facto standard in most data centers today — against Tilera’s lower-performance (in gigahertz) but massively multicore chips. The paper discovered that for large key value stores such as how Facebook stores its user data, Tilera’s many-node chips were more efficient than multiple Intel or AMD boxes, despite the higher costs associated with using non-commodity chips. Those costs include both the silicon and tweaking the software to run on a different instruction set. From the paper:

Low-power many-core processors are well suited to KV- store workloads with large amounts of data. Despite their low clock speeds, these architectures can perform on-par or better than comparably powered low-core-count x86 server processors. Our experiments show that a tuned version of Memcached on the 64-core Tilera TILEPro64 can yield at least 67% higher throughput than low-power x86 servers at comparable latency. When taking power and node integration into account as well, a TILEPro64-based S2Q server with 8 processors handles at least three times as many transactions per second per Watt as the x86-based servers with the same memory footprint.

Although Tilera can’t disclose whether or not this paper means Facebook is buying boxes with Tilera silicon inside, it is one of many cracks that will eventually break the x86 hegemony in the data center, as power constraints and monolithic applications change the economics of computing. No longer does cheap general-purpose hardware win automatically. With companies deploying tens of thousands of servers, sometimes all in service of one application, driving costs out of the boxes and operations have become essential. And because these servers in webscale or cloud operations generally act as a node in one single application (or an aspect of a web service), it makes sense to write custom software for the platform if the savings in power or boosts in performance are sufficient.

So Facebook’s tests could help Tilera boost its business and acceptance for applications that are dependent on a key value store, such as Twitter or Zynga. Along the way, it is part of a trend among larger server buyers rethinking not just the hardware design, but also how the measure their hardware and performance. Ihab Bishara, director of cloud computing products at Tilera, notes that many of the big webscale companies have hired silicon engineers to help eke out as much efficiency as possible from the systems and evaluate the best ways to measure performance for the types of workloads they run on their machines. These trends are set to shake up the world of hardware.

For a detailed look at this, watch the video below with Omid Tahernia, president and CEO at Tilera.

You’re subscribed! If you like, you can update your settings

  1. Vladimir Rodionov Monday, July 25, 2011

    Sorry, but it looks like a biased report. They compare stock x86 memcached version which is known to have poor scalability due to primitive concurrency control (the AMD and Xeon numbers prove this) and optimized for concurrency Tilera version. This is strange. Facebook is known as major commiter to Memcached project and they have developed many performance patches afaik, but for some reason they have not been used in this case study.

    1. First of all, the absolute numbers above are too low. 350K requests

    1. Sorry, there are some leftovers in my previous post. I can not remove them as since there is no way to edit post here.

Comments have been disabled for this post