Updated: Oh to be a fly on the wall for the conversations that must be going on between Netflix and Amazon engineers this holiday season.
If you’re not a Netflix subscriber, you may not yet know that issues at Amazon’s US-East data center facility took down Netflix’ streaming service on Christmas Eve — arguably the worst possible time. Starting at 1:50 p.m. PST, as GigaOM’s Janko Roettgers reported, Amazon’s US east facility reported issues with its Elastic Load Balancing service that carried over into Christmas morning. Interestingly, Amazon Prime Instant Video streaming service, which competes head-on with Netflix and which also runs on AWS, appeared to be unaffected by the US East snafus.
Wait… AWS outage took Netflix offline, but Amazon video stayed up? What the huh?—
Rafe Needleman (@Rafe) December 25, 2012
Update 9:55 a.m. PST: One commenter from Mass. reported his Amazon Prime Instant Video was down for two days. I was able to access that service this morning with no problem. Stay tuned for updates on this.
The latest update to the AWS status page reads:
Dec 25, 4:36 AM PST We continue to work on resolving issues with the Elastic Load Balancing Service in the US-EAST-1 region. These issues are affecting updates to both existing and newly created ELBs. A subset of ELBs that made configuration changes or changes to registered instances during the event are experiencing errors or receiving reduced traffic. We continue to work toward a full recovery of the service. We apologize for the continued impact.
Other sites, including Heroku’s Platform as a Service, were also affected. Heroku, like Netflix, have been down this path before with previous AWS US East glitches.
This, the latest of several problems at Amazon’s Ashburn, Virg. facility, highlights a couple big, recurring issues for Amazon, its partners, rivals, and customers.
1: US-East is Amazon’s largest and oldest data center facility and perhaps not coincidentally it’s also the facility at ground zero of most of the AWS-related outages over the past few years. Still, many customers feel they have no choice but to deploy there since it’s usually the first AWS data center to host new services (For example, Amazon’s new high-storage instance types announced last week are only available from US-East for now.) And US-East tends to be less pricey than Amazon US-West facilities in California and Oregon.
2: Working with AWS now is a lot like a software company partnering with Microsoft in the 80s and 90s — it’s both your biggest partner and your biggest rival so tread carefully. At AWS: Reinvent last month, Amazon CEO Jeff Bezos touched on this topic of “coopetition.” Amazon Prime Instant Video competes with Netflix but “we bust our butts every day for Netflix,” Bezos said.
3: Issues like this one can only help AWS rivals in the OpenStack community — Rackspace, Hewlett-Packard et al that are trying to position their cloud services as options with better service and support, if not the same huge scale as AWS. Partners might also take a harder look at other infrastructure providers like SoftLayer and Joyent. Just saying.
Update: At 8:45 a.m. PST Dec. 25: Netflix tweeted:
Special thanks to our awesome members for being patient. We’re back to normal streaming levels. We hope everyone has a great holiday.
— Netflix US (@netflix) December 25, 2012
Update: At 6:49 p.m. PST Dec. 26, an AWS spokeswoman got back with a statement:
“On December 24, AWS experienced issues with the Elastic Load Balancing service that impacted some customers in the US-East region. Impacted customers started to recover the evening of December 24 and the service was fully recovered and functioning correctly on December 25. We have been heads down ensuring customers are operating smoothly and will be publishing a full summary of the event in the coming days.
Amazon Instant Video wasn’t significantly impacted because it didn’t need to take any Amazon Elastic Load Balancing scaling events during the time there were issues with the Elastic Load Balancing service in US-East. Only Elastic Load Balancers that were scaling up or down had issues during that time period.”