Most of the attention around the Amazon Web Services outage focused on the more popular sites. But they were only a handful among the hundred-plus that were affected, including at least three popular platform-as-a-service providers: Heroku, Engine Yard and DotCloud. That’s because many Platform-as-a-Service (PaaS) offerings are hosted with AWS, essentially adding a developer-friendly layer of abstraction of the AWS infrastructure to make writing and deploying applications even easier than with AWS. Of course, the downfall is that as goes AWS, so goes your PaaS provider.
And that’s exactly what happened today with two very very popular PaaS offerings — Heroku and Engine Yard — and the up-and-coming DotCloud. InformationWeek detailed the Engine Yard situation, which was mitigated by its very, very wise decision to have begun utilizing a multi-region failover strategy that encompasses data centers outside the US-EAST region that was affected today. Heroku has been issuing status updates all day long as it tries to get its service back up and running and AWS makes improvements on its end. In a very informative blog entry, DotCloud detailed how one goes about achieving maximum availability with AWS. It also talked about what can and did go wrong, alluding to plans for preventing similar problems for DotCloud users in the future.
As you might have noted in the myriad headlines generated by this outage, though, PaaS providers are hardly the most noteworthy services that were down today. An impromptu site called http://ec2disabled.com/ has compiled a list of sites (145 as of 4:40 p.m. PDT) that were affected by the outage, ranging from About.me to Zencoder. (A big hat tip to Ruven Cohen for pointing me to this URL.)
As for the status at ground zero, AWS’s US-EAST region, things seem to be getting better. The last update about the EC2 and Elastic Block Storage services, at 1:48 p.m. PDT read:
1:48 PM PDT A single Availability Zone in the US-EAST-1 Region continues to experience problems launching EBS backed instances or creating volumes. All other Availability Zones are operating normally. Customers with snapshots of their affected volumes can re-launch their volumes and instances in another zone. We recommend customers do not target a specific Availability Zone when launching instances. We have updated our service to avoid placing any instances in the impaired zone for untargeted requests.
Relational Database Service customers have something to smile about, too:
2:35 PM PDT We have restored access to the majority of RDS Multi AZ instances and continue to work on the remaining affected instances. A single Availability Zone in the US-EAST-1 region continues to experience problems for launching new RDS database instances. All other Availability Zones are operating normally. Customers with snapshots/backups of their instances in the affected Availability zone can restore them into another zone. We recommend that customers do not target a specific Availability Zone when creating or restoring new RDS database instances. We have updated our service to avoid placing any RDS instances in the impaired zone for untargeted requests.
As Stacey noted this morning, though, the response from the affected sites has been pretty good-natured, probably because, as Quora astutely pointed out on its error message: “we wouldn’t be where we are today without EC2.” I think many sites feel the same way, and they won’t be abandoning AWS anytime soon — if only because there aren’t necessarily many, if any, better options in terms of availability — but, like DotCloud, they’ll start thinking about some advanced failover options if they really want their customers to take them seriously going forward.