Blog Post

RackSpace Outage Hits Home

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

[qi:014] Update: A truck driver because of a medical condition drove into a power transformer in San Antonio, Texas, this evening, causing it to explode. That explosion caused a major power disruption, and the power company in response cut power, which ultimately and took down RackSpace, our hosting company’s Dallas/Fort Worth based data center. Rackspace is based in San Antonio. This is the second time in less than a week that they have had power issues. Rackspace made the following announcement:

Without notifying us the utility providers cut power, and at that exact moment we were 15 minutes into cycling up the data center’s chillers. Our back up generators kicked in instantaneously, but the transfer to backup power triggered the chillers to stop cycling and then to begin cycling back up again—a process that would take on average 30 minutes. Those additional 30 minutes without chillers meant temperatures would rise to levels that could irreparably damage customers’ servers and devices. We made the decision to gradually pull servers offline before that would happen. And I know we made the right decision, even if it was a hard one to make.

Even though we are mostly hosted on, certain parts of the site are coming off of the RackSpace infrastructure. This prevented all our network sites from loading properly. Our email servers went down as well.

Everything seems to be back to normal, but it leaves me with one simple observation: our Internet infrastructure, despite all the talk, is as fragile as a fine porcelain cup on the roof of a car zipping across a pot-holed goat track. A single truck driver can take out sites like 37Signals in a snap.

39 Responses to “RackSpace Outage Hits Home”

  1. I was a big fan of Rackspace always telling anyone who were happy to listen, even to those not willing to ;-))

    But November was a first loss of 150% trust, then this week this is the end of it. We had our server potentially compromised, we then decided with the advice of Rackspace engineer to rebuild the server.

    Now it took far too long over a day, then we realise that we didn’t have backup after the 16th of January, so 5 days with no backup, and yes we do have Managed Backup with Rackspace.

    So now I am with a server which is partially restored, emails are back online but we have lost 7 days of them which is significant.
    And on top of that we have lost one very precious directory whereas the data was a reference and no other backup or copy because it was confidential and was supposively backedup.

    My question to rackspace is how come a Managed Backup remain un-noticed for 5 days. I have told them this of course. How come there is no alert defined if the volume backed-up is suddenly less than 50% of the normal volume ?

    All this to say that I am looking actively at the moment in finding another host for a dedicated server as my level of trust reach the bottom.

    I have actively defended rackspace at our board of director, but this time, I can’t see what excuse I can find for this.

    Sorry guys at Rackspace, but not good enough ;-(


    PS : I am not working for any competitor of Rackspace and I do not have any friend / family / acquaintance with any competitors or related companies to a competitor. I say that in case someone think that it might be the case.

  2. I need to add that placing all your trust in one place is dangerous. No one can predict or defend against every scenario. Rackspace is head and shoulders above any other hosting company I have used but that should never replace thorough and detailed disaster planning and testing. I am using this situation as an incentive to do just that.

  3. We have hosted with Rackspace for 3 years and they have been fantastic. This outage really hit us hard though. It nuked the boot drive on a RAID array of our database server and our 200 customers were offline for 21 hours. We worked all night and all day to restore the DB environment. It has definitely shaken the trust of some of our customers. This could put a fragile company out of business.

  4. A Voice of Reason


    Dear Matt/Jim/Bruno,

    I’m truly impressed in the trust of your neighbor’s recommendation and your ability to negotiate a contract and migrate your services over to Verio in less than 24 hours.

    Next time you try to slam a competitor be sure you don’t leave your name and company URL in your signature file. Your post is full of lies (and spelling errors to boot).

    Nice try.

  5. Did anybody actually read Rackspace’s comments on what happend? The truck did not cause the outage. It was the power company.

    6:30 PM CST Monday, a vehicle struck and brought down the transformer feeding power to the DFW data center. It immediately disrupted power to the entire data center and our emergency generators kicked in and operated as intended. When we transferred power to our secondary utility power system, the data center’s chilling units were cycled back up. At this time, however, the utility provider shut down power in order to allow emergency rescue teams safe access to the accident victim.

  6. Suddenly Leigh Anne wants his 99.999% advertisement removed :)
    Anybody who has worked in a datacenter knows that there is only so much you can get redundant without the cost rising.

  7. The outage occurred in Dallas, not San Antonio (it’s the second sentence in the article.) Rackspace is based out of San Antonio and has data centers all over the place.

  8. I’ve been with Rackspace nearly 6 years and this is a first for me. Even still, I only lost one server (my other is with them in San Antonio) for only about an hour last night. Rackspace was responsive and things were back online reasonably quickly.

  9. I see all this negative hype that RackSpace is getting for a power outtage caused by a guy (supposedly) having a heart attack at the time of the crash.

    I start thinking “How different is this incident than your household electricity shutting off when lightning storm is in your neighborhood?” Sure, you get upset and immediately call the electric company because the outtage disruped your favorite TV show and you’ve been waiting for this episode for over a month.

    What are you going to do now, cancel service tomorrow and hook up with another electric service? Will that guarantee perfect service in a perfect world? Grow up!! Sh*t happens.

    When you get a flat tire, do you blame the highway department for letting debris get on the roads or do you jump right in and sue the tire manufacturer? Will that fix the flat? Give me a break.

  10. I have to honestly say that web site hosting (rackspace, et. al.) hardly counts as Internet infrastructure IMHO … That said, the leaves (edges) of services are lined with single points of failure (services that are not redundant). But none of those services could reasonbly count as infrastructure IMHO …..

  11. Rackspace’s “zero downtime” is a lie, as is their fanatical support that is outright terrible. My neighboer told me to call ntt/verio. I have already taken my services over to them. My advice is to call them – NTT/Verio at 866-341-7867 and ask for Bruno.

  12. Matt Harwood

    Can I just ask, as a human being, anyone know if the driver is OK? I see all this stress over downtime – but a man is involved in an explosion and I haven’t found one report as to his state of health!

    Unless I’m missing something huge here, it makes me sad people now care more about virtual products than physical people.

  13. Amazing… we had a site launch yesterday, it went down just an hour after it was launched. (With a really happy client seeing it going down)

    We do have two dedicated servers in there, the funny thing is that both servers ended up having fried hard drives, and Rackspace performed restore in one of those with a faulty backup file… I mean, it could safer to launch from my computer at home!!!!

  14. We have hosted with Rackspace for some years now, and in my experience they have been growing wayy too fast, so the experienced, high quality administrators from 2 years back are just not accessible anymore. Instead the administrators that are supporting you have very superficial knowledge of the systems they are supposed to manage. In times of trouble, the B-team (as we call them) are not very reliable and in some cases they just panic.

  15. Chris Scott

    I was also down for 3 hours last night.

    They shut down our servers due to the heat at the datacenter. From what we understand:
    In the second incident at approximately 6:30 PM CST Monday, a vehicle struck and brought down the transformer feeding power to the DFW data center. It immediately disrupted power to the entire data center and our emergency generators kicked in and operated as intended. When we transferred power to our secondary utility power system, the data center’s chilling units were cycled back up. At this time, however, the utility provider shut down power in order to allow emergency rescue teams safe access to the accident victim. This repeated cycling of the chillers resulted in increasing temperatures within the data center. As a precautionary measure we decided to take some customers’ servers offline. These servers are now back up, as are the chillers.

    So it seems as the redudant systems worked. With power and all, but the chillers failed when they had to cylce them multiple times because of the accident victim.

    Although all of our servers and our imaged suffered, I can’t say enough good things about rackspace and what they’ve done for us. I mean, with all my experiences with datacenters (esp The Planet) they handled everything as best as I can ask for. They’ve gone above and beyond with any support request me and my team have had and they are simply… Fanatitcal as much as I can expect them to be.

  16. You can’t possibly expect 100% uptime for a single location, regardless of the redundancy built into that infrastructure.

    This is why multi-site architectures (failover or active-active) are used by every for whom downtime really matters. And it is also why the 100% guarantee for Rackspace is only 100% guaranteed to ensure you will have SLA refunds.

  17. Rackspace has showed that they are a marketing gimmick on steroids with these outages. A single truck hitting a power pole taking out their data center shows their lack of redundancy planning.

  18. I’ve worked on and off with Rackspace for almost 7 years and true to their claim I’ve never faced serious downtime issues. They also have sat patiently while addressing problems during server migrations, etc. with my IT staff.

    Personally I couldn’t imagine the embarrassment suffered from a CEO attempting to showcase their online business to investors only to find their server’s gone MIA. However, I also know that Rackspace’s 100% uptime guarantee comes with a solid SLA. One that in times like last week they will make due on.

    I wouldn’t let years of trustworthy service erode so quickly.

  19. rehanyarkhan

    Rackspace has always been and is a and extremely hyped service. Scratch the surface at Rackspace and there is no quality. If you have 2 servers then Rackspace is ok, else they are their so called support is not worth it. And now this amazing failure!

  20. well, get redundancy in data centers. The problem is that redundancy is non trivial to implement on both the software side and on the interconnection side and will cost. How much is your business worth to you? If you can’t do it properly or costs too much to do it yourself, host the site on people that have implemented redundancy for you – google or amazon web services et al. As for the backup power supplies, if you don’t test, chances are the backup isn’t as redundant as you thought it was – batteries die, breakers don’t break, switches fail.

  21. Well, our team was engaging key investors this Sunday/Monday and was also in the middle of our biggest outreach program since launch. And Rackspace let us down on Sunday morning for four hours (no server, email, nothing…emails bounced back! We basically didn’t exist!). And we were not even notified when it happened!

    And then, after 24 hours of me (CEO) explaining the situation to countless people…and assuring them that it was a rare one-off circumstance that would never happen again…IT HAPPENS AGAIN. Our server is still down right now.

    In all seriousness, this could destroy a business. Rackspace’s whole “zero downtime” guarantee has actually been almost 10 hours of downtime in the past 48 hours (not to mention GREAT costs to the credibility and revenues of many businesses out there including my team).

    What corners have they cut with back-up systems, generators, etc!? Truly destructive .