10 Comments

Summary:

Recently, Google and Amazon enhanced their networking options, getting a potential leg up in the four-way performance competition between cloud providers. But how do they compare with Rackspace and Softlayer? David Mytton puts all four through their paces to see which service really comes out ahead.

Compute and storage are essentially commodity services, which means that for cloud providers to compete, they have to show real differentiation. This is often achieved with supporting services like Amazon’s DynamoDB and Route 53, or Google’s BigQuery and Prediction API, which complement the core infrastructure offerings.

Performance is also often singled out as a differentiator. Often one of the things that bites production usage, especially in inherently shared cloud environments, is the so-called “noisy neighbor” problem. This can be other guests stealing CPU time, increased network traffic and, particularly problematic for databases, i/o wait.

In this post I’m going to focus on networking performance. This is very important for any serious application because it affects the ability to communicate and replicate data across instances, zones and regions. Responsive applications and disaster recovery, areas where up-to-date database replication is critical, require good, consistent performance.

It’s been suggested that Google has a massive advantage when it comes to networking, due to all the dark fibre it has purchased. Amazon has some enhanced networking options that take advantage of special instance types with OS customizations, and Rackspace’s new Performance instance types also boast up to 10 Gbps networking. So let’s test this.

Methodology

I spun up the listed instances to test the networking performance between them. This was done using the iperf tool on Linux. One server acts as the client and the other as the server:

Server: iperf -f m -s

Client: iperf -f m -c hostname

The OS was Ubuntu 12.04 (with all latest updates and kernel), except on Google Compute Engine, where it’s not available. There, I used the Debian Backports image.

The client was run for three tests for each type – within zone, between zones and between regions – with the mean average taken as the value reported.

Amazon networking performance

t1.micro (1 CPU) c3.8xlarge (32 CPUs)
us-east-1 zone-1a <-> us-east-1 zone-1a 135 Mbits/sec 7013 Mbits/sec
us-east-1 zone-1a <-> us-east-1 zone-1d 101 Mbits/sec 3395 Mbits/sec
us-east-1 zone-1a <-> us-west-1 zone-1a 19 Mbits/sec 210 Mbits/sec

Amazon’s larger instances, such as the c3.8xlarge tested here, support the enhanced 10 GB networking, however you must use the Amazon Linux AMI (or manually install the drivers) within a VPC. Because of the additional complexity of setting up a VPC, which isn’t necessary on any other provider, I didn’t test this, although it is now the default for new accounts. Even without that enhancement, the performance is very good, nearing the advertised 10 Gbits/sec.

However, the consistency of the performance wasn’t so good. The speeds changed quite dramatically across the three test runs for all instance types, much more than with any other provider.

You can use internal IPs within the same zone (free of charge) and across zones (incurs inter-zone transfer fees), but across regions, you have to go over the public internet using the public IPs, which incurs further networking charges.

Google Compute Engine networking performance

 

f1-micro (shared CPU) n1-highmem-8 (8 CPUs)
us-central-1a <-> us-central-1a 692 Mbits/sec 2976 Mbits/sec
us-central-1b <-> us-central-1b 905 Mbits/sec 3042 Mbits/sec
us-central-1a <-> us-central-1b 531 Mbits/sec 2678 Mbits/sec
us-central-1a <-> europe-west-1a 140 Mbits/sec 154 Mbits/sec
us-central-1b <-> europe-west-1a 137 Mbits/sec 189 Mbits/sec

Google doesn’t currently offer an Ubuntu image, so instead I used its backports-debian-7-wheezy-v20140318 image. For the f1-micro instance, I got very inconsistent iperf results for all zone tests. For example, within the same us-central-1a zone, the first run showed 991 Mbits/sec, but the next two showed 855 Mbits/sec and 232 Mbits/sec. Across regions between the US and Europe, the results were much more consistent, as were all the tests for the higher spec n1-highmem-8 server. This suggests the variability was because of the very low spec, shared CPU f1-micro instance type.

I tested more zones here than on other providers because on April 2, Google announced a new networking infrastructure in us-central-1b and europe-west-1a which would later roll out to other zones. There was about a 1.3x improvement in throughput using this new networking and users should also see lower latency and CPU overhead, which are not tested here.

Although 16 CPU instances are available, they’re only offered in limited preview with no SLA, so I tested on the fastest generally available instance type. Since networking is often CPU bound, there may be better performance available when Google releases its other instance types.

Google allows you to use internal IPs globally — within zone, across zone and across regions (i.e., using internal, private transit instead of across the internet). This makes it much easier to deploy across zones and regions, and indeed Google’s Cloud platform was the easiest and quickest to work with in terms of the control panel, speed of spinning up new instances and being able to log in and run the tests in the fastest time.

Rackspace networking performance

512 MB Standard (1 CPU) 120 GB Performance 2 (32 CPUs)
Dallas (DFW) <-> Dallas (DWF) 595 Mbits/sec 5539 Mbits/s
Dallas (DFW) <-> North Virginia (IAD) 30 Mbits/sec 534 Mbits/s
Dallas (DFW) <-> London (LON) 13 Mbits/sec 88 Mbits/s

Rackspace does not offer the same kind of zone/region deployments as Amazon or Google so I wasn’t able to run any between-zone tests. Instead I picked the next closest data center. Rackspace offers an optional enhanced virtualization platform called PVHVM. This offers better i/o and networking performance and is available on all instance types, which is what I used for these tests.

Similar to Amazon, you can use internal IPs within the same location at no extra cost but across regions you need to use the public IPs, which incur data charges.

When trying to launch x2 120 GB Performance 2 servers at Rackspace, I hit our account quota (with no other servers on the account) and had to open a support ticket to request a quota increase, which took them about an hour and a half to approve. For some reason, launching servers in the London region also requires a separate account, and logging in and out of multiple control panels soon became annoying.

Softlayer networking performance

1 CPU, 1 GB RAM, 100 Mbps 8 CPUs, 2 GB RAM, 1 Gbps
Dallas 1 <-> Dallas 1 105 Mbits/sec 911 Mbits/s
Dallas 1 <-> Dallas 5 105 Mbits/sec 921 Mbits/s
Dallas 1 <-> Amsterdam 29 Mbits/sec 61 Mbits/s

Softlayer only allows you to deploy into multiple data centers at one location: Dallas. All other regions have a single facility. Softlayer also caps out at 1 Gbps on its public cloud instances, although its bare metal servers do have the option of dual 1 Gbps bonded network cards, allowing up to 2 Gbps. You choose the port speed when ordering or when upgrading an existing server. They also list 10Gbit/s networking as available for some bare metal servers.

Similarly to Google, Softlayer’s maximum instance size is 16 cores, but it also offers private CPU options which give you a dedicated core versus sharing the cores with other users. This allows up to eight private cores, for a higher price.

The biggest advantage Softlayer has over every other provider is completely free, private networking between all regions whereas all other provider charge for transfer out of zone. When you have VLAN spanning enabled, you can use the private network across regions, which gives you an entirely private network for your whole account. This makes it very easy to deploy redundant servers across regions and is something we use extensively for replicating MongoDB at Server Density, moving approx 500 Mbits/sec of internal traffic across the US between Softlayer’s Washington and San Jose data centers. Not having to worry about charges is a luxury only available with Softlayer.

Who is fastest?

Fastest (low spec) Fastest (high spec) Slowest (low spec) Slowest (high spec)
Within zones Google Amazon Softlayer Softlayer
Between zones Google Amazon Rackspace Softlayer
Between regions Google Amazon Rackspace Softlayer

Amazon’s high spec c3.8xlarge really gives the best performance across all tests, particularly within the same zone and region. It was able to push close to the advertised 10 GB throughput, but the high variability of results may indicate some inconsistency in the real-world performance.

Yet for very low cost, Google’s low spec f1-micro instance type offers excellent networking performance: ten times faster than the terrible performance from the low spec Rackspace server.

Softlayer and Rackspace were generally bad performers overall, but at least Rackspace gets some good inter-zone and inter-region performance and performed well for its higher instance spec. Softlayer is the loser overall here with low performance plus no network-optimized instance types. Only their bare metal servers have the ability to upgrade to 10 Gbits/sec network interfaces.

Mbits/s per CPU?

CPU allocation is also important. Rackspace and Amazon both offer 32 core instances, and we see good performance on those higher spec VMs as a result. Amazon was fastest for its highest spec machine type with Rackspace coming second. The different providers have different instance types, and so it’s difficult to do a direct comparison on the raw throughput figures.

An alternative ranking method is to calculate how much throughput you get per CPU. We’ll use the high spec inter-zone figures and do a simple division of the throughput by the number of CPUs:

Provider Throughput per CPU
Google 380 Mbits/s
Amazon 219 Mbits/s
Rackspace 173 Mbits/s
Softlayer 113 Mbits/s

The best might not be the best value

If you have no price concerns, then Amazon is clearly the fastest, but it’s not necessarily the best value for money. Google gets better Mbits/sec per CPU performance, and since you pay for more CPUs, it’s a better value. Google also offers the best performance on its lowest spec instance type, but it is quite variable due to the shared CPU. Rackspace was particularly poor when it came to inter-region transfer, and Softlayer isn’t helped by its lack of any kind of network-optimized instance types.

Throughput isn’t the end of the story though. I didn’t look at latency or CPU overhead and these will have an impact on the real world performance. It’s no good getting great throughput if it requires 100 percent of your CPU time!

Google and Softlayer both have an advantage when it comes to operational simplicity because their VLAN spanning-like features mean you have a single private network across zones and regions. You can utilize their private networking anywhere.

Finally, pricing is important, and an oft-forgotten cost are the network transfer fees. This is free within zones for all providers, but only Softlayer has no fees for data transfer across zones and even across regions. This is a big saver.

David Mytton is the founder and CEO of Server Density, a cloud management and server monitoring specialist. He can be contacted on david@serverdensity.com or followed on Twitter @davidmytton

Featured image: Shutterstock/ssguy

You’re subscribed! If you like, you can update your settings

Related research

Subscriber Content

Subscriber content comes from Gigaom Research, bridging the gap between breaking news and long-tail research. Visit any of our reports to learn more and subscribe.

By David Mytton, Server Density

You're subscribed! If you like, you can update your settings

Related stories

  1. Thank you David, These specs will be quite handing in making the correct decisions about Cloud Computing Speed vs Cost

  2. Reblogged this on BigData Admin and commented:
    Great reference article for comparing Cost vs Speed in Cloud Computing decision making.

  3. Great article David, I am missing Azure in the comparisation though, any thought on that?

    /martin

    1. Unfortunately I didn’t do any testing on Azure – would be interesting to see how it compares.

  4. latency and CPU overhead is the next article coming, right?

  5. Brad McConnell Monday, April 14, 2014

    One thing I’ll mention here is that the different flags available in iperf are useful when you’re attempting to test different things. The test used here focuses on single stream TCP performance between two endpoints – in a test like this if all other things are equal (tcp tuning, end to end physical capacity, system performance), the end result is largely determined by the distance between the two endpoints.

    When attempting to discover how much aggregate capacity is available between the same two nodes, it’d be beneficial to send parallel streams at the same time (via -P 20 or similar at the end of the client’s cmdline.) This would make iperf initiate 20 total streams simultaneously, and while each one would be limited to the same “total distance affects TCP performance” situation cited above, their aggregate capacity is more likely to inform you if your test is hitting an actual network QoS policy of physical bottleneck somewhere along the way.

    Thank you for the information provided however, it’s very insightful!

    1. Good point, that would also help with testing the ability to make use of multiple CPUs and further performance.

  6. Why no test for Azure the main amazon competitor? Maybe because is the fastest?! :)

  7. It seems like iperf cannot utilize the full potential of Google’s network. I reran some of the tests using -P switch in iperf and got much better results. How much better? Dramatically better ;-)

    Full test results: http://doit-intl.com/blog/2014/4/16/need-for-speed-testing-googles-networking-performance

    1. Yes, it is important to test multi-thread performance because with a single thread you may be hitting the limits not of the throughput but of the single threaded performance. However, the more threads you use the more CPU power it takes and so there’s a tradeoff with what is left for your application in a real world use case.

      Very interesting to see the inter-region performance though, that’s a good one to try and discover the maximum throughput.

Comments have been disabled for this post