8 Comments

Summary:

Gigaom got a tour of AT&T’s disaster recovery drill in Chicago, an exercise in restoring communications in the unlikely event that an entire core network goes down.

AT&T Hazmat
photo: AT&T

Anyone who has ever seen footage of a hurricane’s aftermath is probably now familiar with the COW, or Cell on Wheels. These temporary cell sites stand in for towers and base stations knocked out by storms (and boost mobile capacity at the Super Bowl), but what happens if a severe storm, earthquake or terrorist attack takes down a far bigger chunk of the communications network?

This week AT&T’s National Disaster Recovery team gave me a sneak peek of an exercise in Chicago to prepare for the most severe outage scenario in its network short of its Global Network Operations Center in New Jersey going offline. The drill was designed to explore how AT&T would deal with a disaster that took out an entire metro central office.

Photo: AT&T

Photo: AT&T

A central office may sound like an admin building, but in telco-speak it’s the term used for those huge windowless buildings packed to the gills with core infrastructure – the terminus for a carrier’s metro fiber lines and way station for all phone conversations. Knocking out a major CO like AT&T’s 27-story concrete monolith on Chicago’s Canal Street could leave much of the Windy City without a dial-tone or internet connection.

AT&T’s disaster recovery trailers in Chicago’s Soldier Field parking lot (Photo: Kevin Fitchard)

AT&T’s disaster recovery trailers in Chicago’s Soldier Field parking lot (Photo: Kevin Fitchard)

So what does AT&T do in case of such a disaster? It brings in a fleet of trucks hauling what amounts to a complete core network in tow. That means fiber trailers to reconnect a city’s downed optical lines, IP recovery trailers that house massive banks of routers, and power trucks and racks upon racks of generators to feed it all.

 

An IP recovery trailer packed with enough routing gear to handle 100 Gbps of IP traffic. (Photo: Kevin Fitchard)

An IP recovery trailer packed with enough routing gear to handle 100 Gbps of IP traffic. (Photo: Kevin Fitchard)

A bank of batteries fed by a mobile power plant: All power has to be converted from AC to DC to protect against power surges. (Photo: Kevin Fitchard)

A bank of batteries fed by a mobile power plant: All power has to be converted from AC to DC to protect against power surges. (Photo: Kevin Fitchard)

DS3 or T3 lines. Though much of AT&T's traffic now travels over fiber, the disaster recovery team still has to restore older copper data connections. (Photo: Kevin Fitchard)

DS3 or T3 lines. Though much of AT&T’s traffic now travels over fiber, the disaster recovery team still has to restore older copper data connections. (Photo: Kevin Fitchard)

And because an event significant to take out a central office is probably going to do a lot of collateral damage to the network, that means mobile base stations: plenty of COWs, COLTS (cells on light trucks) and satellite uplinks to get emergency responders and the general populace back on the grid immediately. The teams that put all this together are regular AT&T employees, but they’ve all been trained for these disaster recovery scenarios, said AT&T Senior Network Specialist Kelly Morrison, who ran the Chicago exercise.

An AT&T National Disaster Recovery crew connecting power cables (Photo: AT&T)

An AT&T National Disaster Recovery crew connecting power cables (Photo: AT&T)

AT&T's Kelly Morrison with a Hazmat suit. A portion of the disaster team is trained to deal with hazardous materials so they can access contaminated facilities. (Photo: Kevin Fitchard)

AT&T’s Kelly Morrison with a Hazmat suit. A portion of the disaster team is trained to deal with hazardous materials so they can access contaminated facilities. (Photo: Kevin Fitchard)

An emergency communications van with satellite uplink, Wi-Fi and mobile small cell. Usually the first vehicle on-site during a disaster (Photo: Kevin Fitchard)

An emergency communications van with satellite uplink, Wi-Fi and mobile small cell. Usually the first vehicle on-site during a disaster (Photo: Kevin Fitchard)

A major core outage doesn’t happen that often, but it has happened. After 9/11, AT&T’s downtown Manhattan network office suffered a complete failure. The National Disaster Recovery team had to recreate it across the Hudson River in Jersey City, attaching to the same fiber ring that served lower Manhattan.

The retracted tower mast of a COLT (Photo: Kevin Fitchard)

The retracted tower mast of a COLT (Photo: Kevin Fitchard)

AT&T’s disaster recovery team will tap into local power supplies where available but can run its network off generator power if necessary (Photo: Kevin Fitchard)

AT&T’s disaster recovery team will tap into local power supplies where available but can run its network off generator power if necessary (Photo: Kevin Fitchard)

Nobody is hoping for another disaster of such scale, Morrison said, but AT&T is prepared for outages of even bigger magnitude. The disaster recovery team has $600 million worth of emergency network equipment at its disposal – enough to build a nationwide communications grid in a small country – all of it distributed across the lower 48 states where it can be deployed quickly by truck or plane. Fully equipped, Morrison said, the team can assemble a temporary core network capable of handling 15 terabits per second of capacity.

 

 

  1. All this, and they’re still terrible.

    Reply Share
  2. How about you reach out to Verizon and see if they will show you their Disaster Recovery capabilities?

    Reply Share
  3. In the interest of fairness, I would like to see a story on the Netflix Disaster Recovery Team? Of course, ATT is worth almost 10 times what Netflix is, so has a little more to recover. Maybe the Google DRT would be a better compare, since their business is worth nearly double ATT.

    Reply Share
    1. Depends what they’re doing. Google and Netflix mostly provide service to end customers, AT&T resales to many giant businesses and provides connections to critical users as well.

      Google no doubt has a decent disaster recovery team, but they mostly sit upon other carriers. If a US city was to be badly damaged they’d simply move to unaffected routes, shifting datacenter capacity overseas if necessary.

      Netflix I believe serves mostly over AWS so they’d simply wait on others. It also helps that Netflix does not have a huge amount of data to store, they can shift to another host if necessary.

      Reply Share
    2. When you say Google is worth nearly double ATT what criteria are you using? If you look at market cap then ATT is worth more.
      Market Cap as of 5/19/2014
      Google = 177.9B
      AT&T = 187.3B

      Reply Share
  4. Nice to know they are thinking ahead. Those bad things can and will happen.
    Leslie

    Reply Share
  5. Before you believe any disaster recovery team story, ask yourself if the plan includes importing and housing personnel from outside the disaster area — because there will be individuals who will not sacrifice themselves, their immediate families or invite further property damage to work the disaster. This goes for police, fire, and medical personnel — so the real test is to see how they manage staffing in an area-wide disaster.

    Reply Share
  6. I work for a business continuity/disaster recovery company. AT&T is doing things by the books.

    Trucks and generators ready to go is always a good sign. The scale of their contingency and the data it would handle is absolutely mind-blowing.

    Every company, even small businesses, should have a BC/DR plan. For the peace of mind you get, it’s a lot more affordable than you think. Superstorm Sandy changed a lot of minds who thought they had a good contingency but really didn’t.

    Reply Share