3 Comments

Summary:

Speech technology is poised to be a game-changer for smartphones, especially as they get embedded into operating systems and hardware. Nuance CTO Vlad Sejnoha said speech is transforming from an alternative to text input into a powerful tool that can connect users more quickly to information.

IMG_1165

We’re still waiting to see the public unveiling of built-in speech technology in iOS 5, something that has been uncovered but has yet to be acknowledged by Apple. Nuance Chief Technology Officer Vlad Sejnoha, whose company is apparently providing the speech technology in iOS 5, isn’t sharing any information about the implementation.

But Sejnoha sat down with me today at the SpeechTek conference in New York City and painted a broader picture of how speech technology is poised to be a game-changer for smartphones, especially as they get embedded deeper into the operating systems and hardware. He said speech is transforming from an alternative to text input into a much more powerful tool that can understand user intent and connect them more quickly to information, using natural language processing, semantic analysis and cloud computing. In essence, speech is becoming the smart short cut for mobile.

“Speech can be built to complement and enhance other interfaces helping you find existing applications or information you’re familiar with but it’s hard to launch or find. Speech can get you there. It’s really powerful direct access; we’re just entering an amazing era of speech,” said Sejnoha.

We’re seeing some of that with Nuance’s latest Dragon Go app, which answers voice activated searches by pulling up a host of websites and apps that complete a user’s query. Instead of search result links, Go allows people to complete actions faster. Vlingo and Siri, the Nuance-powered app bought by Apple, are also pursuing this concept of a smart voice-activated assistant.

But the real power will be in allowing users to make queries of unstructured data and get back answers immediately. He said Nuance’s new partnership with IBM’s Watson program, which uses IBM’s deep question answering, natural language processing, and machine learning capabilities, will be useful for mobile users, helping them seek answers that are currently hard to come by.

For instance, a user could ask what friends thought of a particular movie or restaurant, and Nuance and IBM could digest their social media information and come up with an answer. Or a vacationer driving around in a new city could ask what time local restaurants or businesses close. While this will type of technology will be useful in enterprise settings, Sejnoha said it will be very valuable to mobile users, who are more intent-driven and want a focused distillation of answers.

He said it makes more sense to move speech technology deeper into smartphones, integrating it from the ground up, which will open even more opportunities for speech to aid in the user experience. Sejnoha said handset manufacturers are very interested in integrating speech as a differentiator, to make their hardware stand out. Ultimately, he believes all smartphones will include some form of this deep speech integration working in concert with the visual interface. We’re already seeing that with Microsoft’s use of speech in Windows Phone 7 and Google’s use of voice in Android.

“It’s a new kind of control, like a natural language overlay over the visual framework,” Sejnoha said. “We will have this ability to jump and short-cut through the mini desktop to grab and control things and the best way to do that is getting that capacity up front and having the visual stuff play with that.”

The combination of speech and mobile isn’t just going to change the user experience. It’s already transforming Nuance, which is increasingly finding most of its work deals with mobile.

“We used to have a mobile unit, but that’s losing meaning now because all of our business is turning mobile,” Sejnoha.

  1. In real life speech is aided by context, or situation, visual, location … In other words occurrence. As long as speech to a system is audio only it’s not easy to have understanding, which is not recognition. But aides specially in information retrieval since the base of both is context.

    From my experience with Google speech input it’s build around recognition, I don’t think it’s ready for consumers.

    Share
  2. “that can understand user internet and connect them more quickly to information,”

    Ryan, did you mean “user intent”?

    Share
  3. We’re seeing very rapid growth of adoption of this technology in mobile. iSpeech (http://www.ispeech.org) just launched a free iOS SDK last week for mobile developers to speech-enable their apps. Over 3,000 registered developers already and many more signing up every day. Just a matter of time before this goes mainstream.

    Share

Comments have been disabled for this post