Benchmarking the Cloud: Your Mileage May Vary v

Table of Contents

  1. Summary


Cloud providers sometimes tout performance metrics obtained from benchmarking tests as undeniable proof that their clouds are better than the competition. The actual value to be derived from cloud benchmarks, however, is less black and white. Even the organizations performing the tests acknowledge that although benchmarks can be great guidance in some circumstances, they might warrant little consideration in others. As with all things cloud, users should rely on benchmarks and other measurements at their own risk.

You’d Better Like Apples

Probably the biggest concern (or, more accurately, misconception) with cloud benchmarks is that although they do provide a relatively objective comparison of providers, they’re only truly telling if you like apples.

Ask Jason Read, the founder of CloudHarmony, one of the first cloud-benchmarking services to hit the scene. CloudHarmony is “trying to provide an apples-to-apples comparison of different providers,” he explained during a recent phone call, but it can’t guarantee to any particular cloud user how its application will run on top of any given provider’s cloud infrastructure. CloudHarmony runs a variety of benchmarks developed by Phoronix, but they’re not real-world, custom-built applications. At best, he says, users can expect third-party benchmarks to provide a direction as to what cloud might be the most effective in each unique circumstance, but those users still have to do their due diligence in terms of running their own tests, too.

Or as Doug Willoughby, the director of cloud strategy for Compuware, phrases it, “Your mileage may vary.” The reason is that although benchmarks put each cloud through its paces using a standard application and standard configurations, there are all sorts of variables at play. Probably the most obvious ones are the user’s application and desired configuration; if those are not identical to what was used in the benchmark test, there’s a good chance that the user’s actual experience won’t mirror the result achieved during the test. Every user’s application is unique, and most cloud providers offer a wide range of operating systems, instance sizes and database and storage options, so the chances of creating the same environment aren’t high.


Another important variable is the network, which, as everyone is aware, can experience varying degrees of performance based on overall Internet traffic or a particular ISP’s last-mile network. Such is the case with Compuware’s CloudSleuth service, which monitors the latency of a generic e-retail application hosted on a number of leading clouds.

CloudSleuth runs about 200 different tests to measure network performance, hitting each provider with about 3,000 individual tests per day. Willoughby says that although the performance line for each provider should be fairly flat over a long period, that’s never the case. This is because of the myriad factors that can slow down network traffic below what the baseline should be, such as network congestion or packet loss. For this reason, in fact, he says CloudSleuth is just the starting point for a discussion about cloud computing network performance, and he calls it a visualization tool to see what’s going on rather than a benchmark service to definitively gauge providers’ latency.

There’s also the issue of choosing a uniform instance size and configuration against which to benchmark each cloud. This concern cuts both ways: It might be difficult to get identical CPU, memory and storage specs from each cloud provider, and some clouds offer far-higher-performing instances than those that end up being tested.

Take, for instance, the case of Amazon Web Services and its Cluster Compute Instances. Although AWS rarely achieves the highest performance in any category in CloudHarmony tests, none of those tests used Cluster Compute Instances as the AWS configuration. When CloudHarmony did benchmark a Cluster Compute Instance, it either bested all the other 134 servers that CloudHarmony had previously tested — including standard AWS instances — or was among the top few performers. At $1.60 per hour, Cluster Compute Instances are priced comparably to other providers’ most-expensive virtual machines, but it hasn’t been benchmarked with the rest of the pack, because nothing else really compares, architecture-wise.

And speaking of price, users also are concerned with price performance rather than just performance. As an AWS spokesperson told me during a recent call, this is one reason why AWS doesn’t sweat it when competitors tout their platforms’ outperformance of EC2. Even if another cloud consistently surpasses EC2 in benchmarks or while running real-world applications, it’s rarely by such a margin that it makes EC2 look inadequate by comparison. When potential customers start to consider EC2’s relatively low prices and the general feature-richness of the AWS platform, AWS can be confident that performance alone won’t dictate customers’ decisions.

What Guidance Can We Take?

Don’t let the talk about not reading too much into cloud benchmarks fool you, though. There’s still plenty of useful information in that data. CloudHarmony tested a variety of instance sizes from each provider across several benchmarks in five categories: CPU performancedisk IOmemory IOencoding and encryption and Java/Ruby/PHP/Python performance. Looking at the results, a few noteworthy trends stand out:

  • Storm On Demand was as a top performer almost across the board, occasionally outperforming other clouds by a considerable margin. Bluelock and GoGrid also consistently ranked among the top three providers in all categories.
  • AWS generally fared well — best in the Java/Ruby/PHP/Python test — especially with its larger instance sizes. As noted above, an AWS Cluster Compute Instance performed on par with or better than the field in CloudHarmony’s previous tests.
  • Rackspace lingered near the middle of the pack, while OpSource was often at or near the bottom.

Of course, it’s at this point that the variables discussed above come into play. Performance will vary based on the actual application being run and the overall architecture of any given customer’s deployment; there’s also something to be said about the intangible factors: AWS is a feature-rich platform that has performed well while supporting far more users than any competitor; Rackspace won’t even be the same cloud once it completes its OpenStack makeover; Bluelock is a VMware-only cloud; OpSource is designed for maximum networking and security flexibility; and Storm On Demand is unproven.

Network performance can’t be overlooked either, as the fastest cloud server in the world won’t be worth much if the user experience is crippled by a spotty connection. CloudSleuth’s data shows Microsoft Windows Azure (U.S. Central) delivering the fastest response times (2.82 seconds) for its test application during the 30-day period from April 27 to May 26. Following in successive order were Rackspace (U.S. East, 3.12 seconds), GoGrid (U.S. East, 3.28 seconds), OpSource (U.S. East, 3.45 seconds) and Google App Engine (3.54 seconds). AWS was tenth on the list during that time frame, with 3.85 seconds. Drilling down into 7-day, 24-hour and 6-hour periods, the order doesn’t change drastically.

These differences might not seem like much, but time can equal money on the Internet. Compuware’s Willoughby cites a study showing that every two seconds of wait time leads to an 8 percent abandonment rate, and four seconds results in a 25 percent abandonment rate. All things being equal on the server side — which, of course, they aren’t — a second here or there could be a really big deal for someone trying to make money with a cloud-hosted website.

Cedexis response

Cedexis also monitors network performance via its Radar service. Using different measurements and different source locations — Cedexis Radar measures response times from more than 1.7 million end-user nodes from around the world, whereas CloudSleuth’s tests come from 30 data centers in backbone nodes across the globe — Cedexis comes up with very different results. In the period from April 27 through May 25, its measurements (above) show AWS’ three Eastern regions leading the pack and Google App Engine lagging in the tenth-place spot. Rackspace, Windows Azure and GoGrid are in spots four, five and six, respectively, behind the AWS regions. This seems to confirm that those three providers deliver reliable network performance, but it might be worth users’ looking into why AWS fares so differently and whether that might affect their applications.

What’s Next?

If today’s collection of cloud computing benchmarks and performance measurements are best used as guides, tomorrow’s might be a little more telling as legitimate measures of what users can expect. For one, more enterprise applications are moving into the cloud — including some from SAP and Oracle — so there’s the possibility for real-world application benchmarking like we’ve seen in the physical server world. They’re still not entirely accurate, but benchmark tests of popular, and specific, applications should be a lot more meaningful.

Further, there’s now a large number of products and services in the market that monitor performance of users’ cloud servers. They range from startup cloud services such as ServerDensity to products from big-time vendors VMware (Hyperic) and CA (Nimsoft). There’s a lot of information there that might be prime for big data tools and advanced analytics techniques, if the companies collecting it are willing to release anonymous data or run and share their own analyses. With a little effort, we could start to see a clear picture of which application types, configurations and clouds deliver the best performances.

In the meantime, though, cloud users don’t need to fear that they’re working in the dark when it comes to determining which cloud is the best fit for their applications. Benchmarks and performance measurements can provide good starting points, and cloud resources are getting less expensive by the day. It doesn’t require too much time, money or commitment to run test trials on several clouds and see which one does the job best.

Access Report

Available to GigaOm Research Subscribers

Subscribe to
GigaOm Research