8 Comments

Summary:

Not all cloud infrastructure is the same, as David Mytton discovered when he started looking into Google Cloud. The service differs markedly from AWS and SoftLayer in these five key ways.

Cloud survey

At Server Density, we process over 30TB of time series data points each month, so I’m always experimenting with different technologies to see if we can do things more efficiently or cheaply. Our current environment is across x2 SoftLayer data centers, both dedicated and cloud. But recently I’ve been playing around with Google Cloud.

I think Google is currently in the best position to challenge Amazon because they have the engineering culture and technical abilities to release some really innovative features. IBM has bought into some excellent infrastructure at Softlayer but still has to prove its cloud engineering capabilities.

Amazon has set the standard for how we expect cloud infrastructure to behave, but Google doesn’t conform to these standards in some surprising ways. So, if you’re looking at Google Cloud, here are some things you need to be aware of.

1. Google Compute Engine Zones are probably in Ireland and Oklahoma

In 2012 Google released impressive internal photos of their data center facilities and mapped them out. However, the Compute Engine Zones are very non-specific, e.g. “europe-west1-a”. Indeed, they have only two geographical regions (Europe West and US Central) compared to Amazon’s nine. In addition to its 13 locations, SoftLayer has announced 15 new data centers just for this year.

Google’s networking is very opaque. If you traceroute an Amazon or SoftLayer instance, you can see where traffic is going, the network providers, and usually the locations of the routers. In contrast, Google goes into its network at the closest POP, and everything else is very hidden.

It’s possible to guess where Google is locating its cloud. A test of a Google Compute Engine instance showed round trip responses within 20ms from London, UK. If we compare that to pings from London to the three European countries where Google has facilities — Ireland, Belgium and Finland —  we can rule out Belgium and Finland because the ping round trip time is too high. Only the Ireland facility is close enough.

Google europe-west1-a Amazon eu-west (Ireland) Belgium
0.be.pool.ntp.org
Finland
0.fi.pool.ntp.org

20ms

22ms

38ms

49ms

Given that that the release of the Europe zones in December 2012 happened just a few months after the Ireland facility came online in September 2012, I’d bet that the Europe West location is in Ireland.

The US locations are more difficult to pin down because Google has multiple facilities in what you could call the “central” United States. However, given that ping times from London to SoftLayer Dallas are so similar to London to Google us-central1-a, I’d guess that this means us-central1 is the Mayes County, Okla., location, which is around 270 miles from Dallas.

Google
europe-west1-a to us-central1-a
SoftLayer
Amsterdam to Dallas 1
Amazon
eu-west (Dublin) to us-east (Virginia)
Ping RTT 111ms 112ms 95ms

2. Google’s Compute Zones may be isolated, but they’re surprisingly close together

One of the key architectural decisions of AWS is that you deploy across multiple zones in a region, but for complete redundancy, it’s a good idea to deploy across multiple regions. There have been instances where entire AWS regions have suffered outages, even though each zone is supposed to be isolated. Amazon acknowledges as much: “Although rare, failures can occur that affect the availability of instances that are in the same location.”

Compare that to Google: “Each region is completely isolated from other regions, and each zone is completely independent of other zones. If a zone or a region suffers a failure, other zones and regions won’t be affected.”

London to europe-west1-a London to europe-west1-b europe-west1-a to europe-west1-b
20ms 18ms 0.032ms

The ping RTTs reveal that even though the ping times between west1-a and west1-b are slightly different, the internal latency is so low that it must be within the same geographical region. Light can only travel 9.5km in this short time, and given this is the round trip, the actual distance is at most half that. But if Google’s availability statement is correct, the west1-a and west1-b zones cannot be within the same physical facility. So Google must have separate data centers within Ireland, no more than a few kilometers apart.

3. Scheduled maintenance takes zones offline for up to two weeks

One of the surprising things about Google zones is that they have regular maintenance windows, lasting up to two weeks, where all your instances in that zone will be deleted and the entire zone will be unavailable. Since you’ll need redundancy while one of the zones is down, this means that in practice you have to deploy across multiple regions. Since Google only has two regions, you must deploy in Europe and the US.

It’s worth noting that Google supports cloning instances to different zones and regions, so you could shift your workload over to another zone. Google Cloud allows disks to be mounted by multiple instances at the same time (even if the disks are read-only).

Nevertheless, this requires your design to be completely fault-tolerant and to transparently failover between zones. Although this is the premise you should operate on with any hosting provider — and especially cloud environments like Amazon and SoftLayer — their services are much more traditional in the sense that zones only go offline in the event of an outage. Google maintenance events happen every few months!

4. You cannot guarantee where your data will be located

Although you can choose your region to be US or Europe, there is no guarantee that your data will remain in that region while at rest. This is problematic if you have data protection requirements, for example if certain data cannot exit the US or the EU.

5. Connectivity across regions isn’t fast

Google’s fiber ownership should mean that Google has a big advantage when it comes to networking. In fact, Google touts this on the Google Compute Engine homepage: “Create large compute clusters that benefit from strong and consistent cross-machine bandwidth. Connect to machines in other data centers and to other Google services using Google’s private global fiber network.”

Within a single region, the connectivity is very fast, but you’d expect this if the instances are just a few kilometers away. However, between regions, it’s not quite so good.

Google
europe-west1-a to us-central1-a
SoftLayer
Amsterdam to Dallas 1
Amazon
eu-west (Dublin) to us-east (Virginia)
Ping RTT 111ms 112ms 95ms
Transferring a 1GB file over scp 1.8MB/s 3.6MB/s 2MB/s

(Tested with Google n1-standard-2, Amazon m1.small and SoftLayer cloud 2CPU / 1GB RAM)

This isn’t specific to Google. There are limits on worldwide network speeds, but although the latency was extremely consistent (mean deviation only 0.396ms), unfortunately the bandwidth is not so impressive.

GCE is worth a closer look

Google Cloud Compute Engine has some very nice features: friendly APIs, good control panel, fast-instance booting, flexible persistent disks, flexible network routing, good internal network performance, etc. They also offer SLAs for every service, compared to Amazon only offering them on EC2. Reviews have also indicated that it’s better at performance generally than EC2, and I’m looking forward to testing it with some real workloads, particularly the disk i/o performance. However, if you’re coming from EC2 or other cloud providers, you need to make sure you read the docs properly. Some things are just very different, such as the two-week maintenance windows.

Given how fast Amazon releases new features, Google needs to keep up and accelerate the pace of new feature releases (see their release notes for an idea of the velocity of development). They need to be more open about how things work, data center locations, network routing and performance. It’s disappointing that Google only offers two regions, so I hope it allows access to more of its 12 worldwide data centers soon. There’s a great opportunity to open some of their specialist tools too, like they have with BigQuery.

Overall, I’m excited to see Google compete in this market. It joins Amazon and IBM SoftLayer as the ones to watch now.

David Mytton is founder and CEO of Server Density which offers a SaaS tool to provision and monitor infrastructure. He can be reached at david@serverdensity.com or followed on Twitter: @davidmytton

  1. meanderingmark Sunday, March 2, 2014

    I think you missed three decimal places in your light speed calculation

    Share
    1. meanderingmark Sunday, March 2, 2014

      Oh, nope my mistake, ignore me

      Share
  2. Jim Haughwout Sunday, March 2, 2014

    It is not ideal if I cannot tell or control where my data is. This is not so much of a control issue but it inhibits my ability to comply with things the the UK and EU Data Protection acts — and any enterprise use in a regulate industry (defence, health care/pharma, finance, insurance)

    Share
  3. Johan Glantzberg Tuesday, March 4, 2014

    Thank you for your splendid analysis!
    It seems some of the big cloud providers still has a journey to offer full “Enterprise” capabilities (at least to Enterprises outside the US).

    If located within the EU and aiming to use cloud capabilities (for part of) your IT, it is (still) worthwhile to check what your local/regional cloud´providers can offer.

    Share
  4. Hi David, I just came across this piece. Its a cool bit of detective work, but you have to be careful drawing too many conclusions from ping tests within the Google network.
    Unfortunately, as you say, we don’t have live migration in Europe, but its coming soon! I’d also like to point out that while *project data* may leave european data centres, actual app data won’t. The wording could maybe be clearer, and indeed, this point is more clearly made in the contract.
    Happy to chat offline at some point.
    Tom Grey (Google Enterprise Solution Engineering)

    Share
    1. Would be interesting to learn why internal Google pings are different from external! Sounds like some clever networking going on, perhaps?

      Regarding the project data vs app data, this distinction definitely needs to be made clearer as I’m still not sure what that actually means. I’m guessing “project” means things like meta data (names of servers or billing info, for example) whereas “app data” means things like the contents of persistent disks? Is that correct?

      Share
      1. Basically, yes :)

        Share
    2. Hi Tom,

      You wrote “this point is more clearly made in the contract”.

      I read the terms and I see nothing about actual app data staying in Europe when using an European datacenter: https://developers.google.com/cloud/terms/

      Any insight on this?

      Thanks

      Share

Comments have been disabled for this post