Why do tech-savvy companies like Heroku, Pinterest, AirBNB, Instagram, Reddit, Flipboard, and FourSquare keep so much of their computing horsepower running on Amazon’s aging US-East infrastructure given its problematic track record? US-East experienced big problems again Monday, impacting those sites and more. The latest snafu comes after other outages in June and earlier.
Why they’re sticking with US-East — especially since Amazon itself preaches distribution of loads across availability zones and geographic regions — is the multimillion dollar question that no one at these companies is addressing publicly. But there are pretty safe bets as to their reasons. For one thing, Ashburn, VA-based US-East came online in 2006 and is Amazon’s oldest and biggest data center (or set of data centers).That’s why lot of big, legacy accounts run there. Moving applications and workloads is complicated and expensive given data transfer fees. Face it, inertia hits us all — take a look at your own closets and you’ll probably agree. Moving is just not easy. Or fun.
Data gravity is one issue. “If you’ve been in US-East for a while, chances are you’ve built up a substantial amount of data in that region. It’s not always easy to move data around depending on how the applications are constructed,” said an industry exec who’s put a lot of workloads in Amazon and did not want to be identified.
In addition, the dirty little secret to the world at large is that many applications running on AWS “are really built with traditional data center architectures, so moving them around is akin to a data center migration — never an easy task in the best of circumstances,” he added. While most companies want to run applications and services in multiple venues, the complexity of doing so can be daunting, he said. He pointed to a post-mortem of an April 2011 Heroku outage as an example.
US-East by default
Vittaly Tavor, founder and vice president of products for Cloudyn, a company that helps customers best utilize Amazon services, said the deck is still stacked in US-East’s favor nine months after the company’s new Oregon data center was activated. For one thing, the AWS console directs customers to US-East by default. So if you don’t know better, your stuff is going to go there, he said.
The US-West 2 data center, in Oregon, is newer but also smaller than US-East. Tavor suspects that Amazon may tell very large customers not to move there. “Oregon is much smaller than US East so if you’re a company of Heroku’s size and need to suddenly launch lots of instances, Oregon might be too small,” he said. and, US West 1, in California, is more expensive than either of the other two because of the region’s higher energy and other costs.
For the record, as of Tuesday morning, Amazon was still sorting out residual issues from the problem — which surfaced there at 10:30 a.m. PDT — according to its status page:
4:21 AM PDT We are continuing to work on restoring IO for the remainder of affected volumes. This will take effect over the next few hours. While this process continues, customers may notice increased volume IO latency. The re-mirroring will proceed through the rest of today.
I have reached out to several of the affected companies and to Amazon itself and will update this if and when they respond. Of course, Amazon competitors are having a field day. Check out Joyent’s mash note to Reddit .