20 Comments

Summary:

Verizon’s LTE network has had a hell of a month. After a year of smooth performance, interrupted only by one major glitch in April, the new ultra-fast 4G network has experienced a string of three outages in a single month. Verizon’s network head explains why.

verizon-logo-470x310

Verizon’s LTE network has had a hell of a month. After a year of smooth performance, interrupted only by one major glitch in April, the new ultra-fast 4G network has experienced a string of three outages in a single month, shutting down access to smartphone and wireless hotspot customers across the country. In an interview with GigaOM, Verizon Wireless VP of network engineering Mike Haberman tried to shed some light on the LTE network’s recent problems and explain how Verizon was taking the necessary steps to ensure that they don’t happen again.

Haberman said that LTE is still a brand new wireless technology and Verizon was the first global operator to launch it on a large scale. That means Verizon will be the first operator to encounter the bugs and glitches hiding within any new technology. “Being the pioneers, we’re going to experience some growing pains,” Haberman said. “These issues we’ve been experiencing are certainly regrettable but they were unforeseeable.”

All three outages were caused by problems in Verizon’s service delivery core — in telecom-speak called the IP Multimedia Subsystem (IMS) — which replaces the old signaling architectures used in 2G and 3G networks, Haberman said. While IMS has been around for some time, Verizon’s is the first implementation in an LTE network and it has continued to be a problem spot ever since April, when a software bug originating deep within the IMS core led to a complete failure, kicking LTE customers off both Verizon’s 3G and 4G networks nationwide.

Verizon fixed that software bug, but new IMS glitches have reared their heads – none as big as the one that caused April’s outage, but all taken seriously by Verizon nonetheless, Haberman said. The first outage on Dec. 7 was caused by the failure of a back-up communications database. The second, last week, was the result of an IMS element not responding properly, while Wednesday’s outage was caused by two IMS elements not communicating properly, Haberman said.

So while the LTE radio network was working just fine, customers weren’t able to connect to it since the IMS network simply wasn’t able to recognize to them. Verizon was able to force phones to stop trying to access 4G and fall back on its 3G CDMA network after it identified an IMS failure. But before the switch-over took effect some customers were left without 3G, as their phones kept trying to log into the 4G network.

Haberman said once each problem was fixed, it never recurred. Every subsequent outage is a result of a new bug, and it just so happens that December was the month many of these bugs chose to reveal themselves, Haberman said. Veizon’s IMS systems are a complex network of databases, servers, routers, gateways and policy managers supplied by multiple vendors. Alcatel-Lucent, Nokia Siemens Networks, Acme Packet and Tekelec all provide different parts, but Haberman declined to identify which particular elements or which particular vendors were responsible for the problems. In fact, Haberman defended Verizon’s vendors saying that they were experiencing the same LTE growing pains as Verizon.

While Verizon won’t promise that no more outages will occur, Haberman said it has taken measures to ensure that they’re minimized when they do happen in the future. He said he’s begun geographically segmenting the LTE network, so if a software bug does break out it can be isolated to a particular region or market instead of spreading nationwide. Verizon is also upgrading all of its software and cutting down on the signaling clutter running over its IMS grid.

“Our goal is to ensure that our 4G network meets the same high standard that our 3G network does,” Haberman said. “We’re not there yet, but we’ll get there.”

As I’ve said before Verizon needs to be cut a little slack. LTE isn’t some upgrade like HSPA. It’s a fundamental rethinking of every aspect of the wireless network: moving from hardware to software driven base stations, evolving network service delivery systems from old hierarchical voice-centric chains of gateways to new flat IP architectures, and replacing old copper backhaul links with fiber Ethernet to the tower. And as the first to launch LTE, Verizon will be the first operator to encounter its faults. I’m surprised we hadn’t seen a string of outages before December.

But Verizon does have to uphold its claims as having the country’s “most reliable network.” Many customers pay a big premium to use Verizon’s service versus its competitors’ precisely because of its network performance and coverage. Three outages – even if they were intermittent – during the biggest month of the year for phone sales and activations will hardly help that reputation. Verizon must have had hundreds of thousands of activations in the last week due to Christmas gift giving. Many of those customers probably turned on their phones to discover they had no 4G service.

  1. It seems like geographically segmenting the network should have been in the original design. As a QA Engineer, it shocks me that they tried to run the entire network nationwide without regional backups. As a VZW 4G user, their story about 3G still working when 4G goes down is utter BS! My Thunderbolt never goes to 3G during their outages, it always reverts to 1x.

    Share
    1. actually its right my 3g still works idk whats your problem

      Share
    2. Verizon actually has to force 3G on their end. For some reason, if 4G is down it will not negotiate the handoff to 3G on its own, so it needs to be forced to, and there isn’t anything you can do about it.

      Share
      1. Nothing you can do about it? Sure you can, at least on my Android phone, I can switch to CDMA mode only so it never tries LTE. Thats what I did when the issue happened on Wednesday and I was fine… mostly.

        Share
  2. I’ll take an occasional growing pain or two when I average 25-32mbps down on my gnex.

    Your voice and sms are (of course) on 3g all the time. Switching data from 4g to 3g takes all of 30 secs… two clicks and a power cycle.

    Share
    1. I’d like to know where/how you’re getting those kinds of dl speeds. I’ve tested mine a couple times a week since I got my GN, and never seen higher than 12mbps. Usually I’m around 9-10mbps. This is in Columbus, OH.

      Share
      1. Kevin Fitchard Tuesday, January 3, 2012

        Hi Jackseric, Phonepimp’s averages are probably a rare case. If you’re getting 9-10 Mb/s you’re right in the network-wide average.

        Share
  3. I’ve seen a string of mentions about Verizon. They are fast becoming the Bank of America of their genre. They have replaced intelligent customer service with apologetic customer service…i.e. I am sorry but there is nothing we can do to help. What is wrong with corporate America today? well, we already know what Occupy Wallstreet thinks and I am more and more inclined to agree with the premise every single day. You don’t have to be good if you have someone by the…well…you know. How do we get this to improve? I am not sure but I do know one thing…every single day I now explore alternatives and sooner or later the “Mark Zuckerberg” of wireless and alternative power and fuel sources is going to come along and I hope I am alive to jump on board. At this point people don’t have too much to lose.

    Share
    1. So many words and so little “meaning”…

      Share
      1. So few words and so little understanding.

        Share
  4. Cut slack? Nah. Need to start calling asking for a prorate on lost service. Verizon is probably farming out management of this sea of vendors to their “partners” i.e. the vendors.

    Share
    1. +1

      Share
    2. Kevin Fitchard Sunday, January 1, 2012

      Yep, I agree that customers should hold Verizon accountable. And considering the amount of downtime customers face, they should probably demand a sizable credit. I wouldn’t be surprised if VZ granted it either.

      I guess what I was trying to say (and probably wasn’t very clear about) is that from a purely tech level, Verizon is in the wild, wild west with LTE. It’s unreasonable to expect that there would be no new problems with such a new and untried technology. At the same time, consumers shouldn’t be Verizon’s beta testers.

      Share
    3. $30/month for data access… 30 days in a bill cycle. 1 day outage =$1. Do you value your time so little that the time it takes to make a phone call and ask for a credit is worth the $1 off the next bill?

      Share
      1. Kevin Fitchard Tuesday, January 3, 2012

        Actually, I don’t think a full month’s credit or $30 or $40 would be unreasonable to ask. It’s not the total length of time the network was down, but that it was down at three critical times.

        Share
  5. Kevin,
    You need to do some more research, they’ve had several outages throughout the year that are all quite similar. Prior to December there were 3 others. Seems like they have some problems with the HSS, SPR or AAA. If they spent a little money on lab simulation they would catch the problems in their network design before they show up in the field.

    Share
    1. Oh please, it’s impossible to stimulate the production network when dealing with systems on this scale. Do you think they’d spend millions of dollars on a network like this and put it into service with no testing?

      Share
  6. Can you hear me now? A good service company provides the human components above and beyond everything else. A good infrastructure-as-a-service provider also provides the service of aquiring infrastructure… Verizon has done a good job of aquiring towers, but has done a poor job of aquiring talent that has the authority street-team customer service with either concierge quality or efficient pricing. Can you hear me now? We want a utility model for wireless access that doesn’t need to layer competitive services on top of what’s a natural monopoly in terms of tower infrastructure. Stop buying software companies and start buying software and selling network infrastructure as a service that you focus on. Can you hear me now?

    Share
  7. LTR is a delivery mechanism. If there is a complete outage it is because the outage lays in a more traditional network failure. Whether it be zoya, tata, level 3 or any body else. They have some major core issues that need resolved.

    Share
  8. Verizon outage Feb 17 2012 scion WHY and when will it be back on line?
    Location Shelbyville, Indiana

    Share

Comments have been disabled for this post