Summary:

This morning RIM’s co-CEO Mike Lazaridis took to the Internet two more times, once via YouTube and once, with co-CEO Jim Balsillie, in a web…

Blackberry Curve
photo: Flickr / lilivanili

This morning RIM’s co-CEO Mike Lazaridis took to the Internet two more times, once via YouTube and once, with co-CEO Jim Balsillie, in a webcast press conference, to relay an apology from RIM (NSDQ: RIMM) to all of its customers over the outage that started on Monday and extended to email, BBM messaging and internet services for consumers and enterprise users. Most importantly, during the webcast, Lazaridis noted that all services are now up and running again.

A few more details came out about the system failure, but first, another apology: “I want to apologize to all the customers we have let down,” said Lazaridis at the beginning of the call. “You expect more from us. I expect more from us.” He said RIM is committed to restoring that trust over the next coming weeks — but both CEOs were still quiet on the subject of compensation for those who have lost service.

Apart from the big news that the service is now up and running worldwide, Lazaridis confirmed that the problem related to a failure in one core switch in the UK, a problem that “cascaded” through the system when the backup switch didn’t function as intended. The failure in Europe, he noted, then overloaded other systems elsewhere. That backlog in turn “impaired service levels” elsewhere.

As we pointed out yesterday, the architecture of RIM’s data services has proven to be one of the crucial issues in this crisis: all of RIM’s data services — email, BBM messaging and all Internet-based services including apps — are run through RIM’s BlackBerry Internet Servers (BIS). That means when the BIS went down, it took all of RIM’s smartphone mojo with it. The QNX platform that powers the PlayBook runs using a different kind of architecture, and so was not affected by this outage.

Lazaridis noted that this was the biggest failure of its kind on the BlackBerry network, and the first failure in 18 months. Typically, service levels had been running at 99.97 percent before Monday.

The company, he said, was taking “immediate steps” to minimize risk of this happening again, including working with its vendors to correct the switch that failed and auditing the infrastructure to understand why the system went down in the first place and took longer to bring back than expected. (So, still some holes left to fill there.)

Balsillie RIM would say what vendors were behind the equipment. He said there were multiple companies involved so it wasn’t fair to pinpoint any one at this point. He noted that CTOs from operators were calling in offering assistance but it sounds like the company’s own network engineers were the main ones working through the problem.

The compensation issue will continue to hang over the company, but now that the root problem is sorted out, we can expect to hear from from RIM on this.

You’re subscribed! If you like, you can update your settings

Comments have been disabled for this post