[qi:83] Back in September 2006, Nokia (NOK) CEO Olli-Pekka Kallasvuo declared that phones were no longer phones but multimedia computers that play back music, record videos, snap photos and — oh, yes — make phone calls. Apple’s (AAPL) iPhone has only reinforced that notion. And as […]

[qi:83] Back in September 2006, Nokia (NOK) CEO Olli-Pekka Kallasvuo declared that phones were no longer phones but multimedia computers that play back music, record videos, snap photos and — oh, yes — make phone calls. Apple’s (AAPL) iPhone has only reinforced that notion. And as the phone morphs into a multimedia marvel, there is a growing realization that the traditional user interface of a phone, the 12-key keypad, may no longer be enough.

The keypad limits how much information we can input into the tiny devices, and acts as a speed bump when we’re trying to navigate through a complex array of features. And what that means is that we need new ways to interact with mobile devices. Apple, for one, has bet on the touch screen and the fluid UI.

And then there are those who believe that voice input is the way to go.

Microsoft (MSFT) bet about $900 million when it bought TellMe Networks. Some startups are voice believers as well, such as Cambridge, Mass.-based Vlingo Corp, which I’ve previously written about. Earlier this week, I got a chance to see a demo by Yap, a Charlotte, N.C.-based company with a similar approach — that is, taking voice and inputting it as text for everything from IM to navigation.

Like Vlingo, you need to download a Yap application on your mobile phone to get going, and then use voice to enter everything from instant messages to TwitterGrams. Yap also does voice processing on the server side, and then sends information back on the mobile data channel.

There are others who are taking speech recognition even further — embedding it right into the chips that go into Bluetooth headsets. Cambridge Silicon Radio, a maker of Bluetooth chips, is now embedding speech recognition technology from Sunnyvale, Calif.-based Sensory into chips that will find their way into the Bluetooth headsets by the first quarter of 2008.

Sensory CEO Todd Mozer believes that everyone wants to do big things with speech recognition and mobiles, but in the end, the simple functions that enhance the hands-free experience are what make the most sense and are the most useful. Bluetooth devices make perfect sense as a starting point for voice commands. I agree with Mozer.

When I saw the Yap demo, I got the feeling that the application was trying to do too much; it needed some focus. After all, Yahoo (YHOO) and Google (GOOG) can take a similar server-centric voice synthesis approach and provide a more enhanced offering. Moreover, they can use their partnerships with large carriers to squeeze out little players such as Yap.

P.S.: All this interest in voice recognition and related technologies could explain the nice bump in the share price of Nuance (NUAN): up 64 percent for the year, even despite a recent pullback that’s mostly because company’s move into the mobile voice recognition arena isn’t sitting well with the Wall Street types.

Related Post: Sit Up and Listen, Future of Software.

You’re subscribed! If you like, you can update your settings

  1. Microsoft didn’t pay $900 million for tellme more like $750 million

  2. Still the best Grammar Editor on earth is Grammar Studio at Voice Web Solutions. No one can do visual grammar editing better.

  3. I agree that voice is the next big thing but it needs to come alot farther than it has. I hate dealing with any customer service that has voice recognition, it just does not work good enough. Has anyone had any good experiences with it so far? Texting and IM is all fine and dandy but lets get some real world applications set up.


  4. Why would you want to talk all the time, imagine coming back tired from the office and saying “bring up the jetsons” versus clicking three buttons on your remote and bringing up the show.

    The mouse/remote/keypad wins handsdown (pun intended)

    sometimes we do not like to talk. HAL was irritating for the reason.

  5. I think voice recognition in some scenarios say looking up numbers or short text messages is a good usage scenario. similarly using it to enter addresses etc for maps, again voice can be very useful.

    Pravin, the scenario here is for the mobile phones, not for home. I guess, at home the remote control always wins.

  6. Voice recognition in phones is nothing new. I still remember the earliest Samsung cellphones, with the “Please say Name” (not a typo) voice dialing.

    While voice recognition has come far, It is still a very clunky way to deal with the system and the novelty wears off after a while. Mistakes by the system are really irritating to deal with when you are trying to get something done .. quick.

    On a lighter note, Jeremy Clarksons infamous tryst with the Mercedes S’s voice recognition should bring a chuckle.. It points out why speech recognition still has its limitations as an input method.


  7. Alexander van Elsas Wednesday, September 19, 2007

    KPN, the Dutch Telecom operator, has done many different experiments with voice recognition in live services. It is a difficult technology to make it work for millions of people. Very simple tasks can be supported, for example there is a traffic jam service that you can call and it tells you if there is a traffic jam on a road that you speak out. There is also a directory service for getting phone numbers.
    Better results are obtained in security though. Voice patterns can be used as security measures in for example banking via phone. As a stand alone technology it is still just not god enough in my opinion.

  8. Om, sincerely appreciate your comments. The broad demo was intentional in order to showcase the flexibility of the platform (dependent on the endpoint we’ll have more or less options available to the user). Another thing to remember is that mobile consumers often times do not want to switch contexts; if Yap can support 80% of what you’d normally need a mobile browser for, we’ll be successful…press a button, ask for something, boom, and then we get the heck out of your way.

    As for competition from the portals, if their strategic strength were as you portrayed it, the white label search providers (such as JumpTap and Medio) would not exist. You should hang out on the East coast more to reset your perceptions there. ;-) Our aspirations are well aligned with the carriers strategies, and we expect long and fruitful partnerships with them. Warm regards and thank you for your insightful questions at the event! i.

  9. I’ve always been a proponent of voice-commanded portable devices. If for no other reason because a non-physical interface is optimal as the size of the device shrinks to (near) zero.

    Even though I have small fingers, It has seemed apparent to me that using the keypad on a mobile phone, for instance, is just clunky. There’s only so much variety in input you actually need from a mobile appliance, yet this variety is greater than a 12-key keypad.

    The only other input that I think is worthy of a mobile device’s form factor is some kind of eyeball tracking, but we’re not there yet.

    I’m on the fence about the iphone/ipod touch model, but I can’t get over the idea that a mobile appliance’s access should optimally be limited in scope.

  10. One of the issues I remember from working with voice portals based upon natural language recognition back in 1999 where the database issues.

    Every question should get some sort of response. (For us it was called the beep problem when there was no entry in the database).

    If the playing field is pretty limited like the traffic reports that isn’t a problem, but when you’re trying to roll out numerous services the numbers of response start to add up.

    I’m currently not in that field anymore but what I remember where the amount of people training the software, so it would be usable.

Comments have been disabled for this post