By Surj Patel
We speak faster than we can type. Much faster. The average conversation speed of an American speaking English is 120-160 words per minute, whereas the average typing speed requirement is about 40 words per minute and the top 10 percent of keyboarders type at speeds of 64 words per minute or faster. Do the math. You could communicate nearly three times faster with a computer if it could understand you.
So why on earth are we still using two-dimensional representations of the real world like desktops, menus and pointers? I believe we could do a lot better by exploring how applications can work better with voice-driven user interfaces. Not just pure voice interfaces but also what are known as multimodal interfaces, which combine the best of vocal/audio interface elements with the best of what the screen can do.
Voice-driven user interfaces have long been the holy grail of human computer interface design. Science fiction is a great representation of this – in our fantasy worlds, characters interact with machines by speaking to them, not by accessing some desktop start menu. (Remember Scotty in Star Trek IV: The Voyage Home? “A keyboard? How quaint!”)
Unsurprisingly, Microsoft (MSFT) has been slowly but surely forging ahead in this area for a long time now. It started with massive investments in the ill-fated Lernout and Hauspie, whose founders pioneered a lot of speech interface technology. Microsoft built the technology into various facets of the Windows operating system, and although most of the functions were never widely adopted by the general public, they still remain there today.
Its recent acquisition of Tellme Networks was eye-opening. Most people saw the acquisition as a way for Microsoft to start bringing the other killer app of business technology — the telephone – into a bigger offering where Microsoft Office was integrated with voice-based unified communications.
I think, however, that Microsoft is smarter than that. When it comes to speech, speech interfaces and actually using speech interfaces to make money, Tellme has one of the best teams in the world. They helped push and standardize the VXML format, which helped make telephony developer-friendly, and they released an awesome Mutimodal local search app as well. More than any other company, Tellme will know how to scale from the intranet and Internet to the worldwide telephone network. Don’t forget, there are about 400 million PCs in the world and 2.2 billion telephone handsets. That math wasn’t hard to comprehend in Redmond.
So what, potentially, is next? What are some of the ways that voice-driven user interfaces could change our computer experience for the better?
Idea 1: Help mechanisms. Rather than presenting me with thousands of options in a help menu — none of which have anything to do with what I need help in — just connect me, voice-to-voice, with an expert.
Idea 2: Related conversations. Let me start a conversation around a document while I’m reading it. Or give the document a persistent chat associated with it. Failing that, the back end could organize a meeting for those involved in the doc and make that happen. After all you want to discuss that doc, right?
Idea 3: Smarter transcription. Transcription, in other words, that takes into account a person’s cultural background and doesn’t solely interpret sound as though it was coming from someone whose first language is English.
Idea 4: Coordination commands. “Open the memo Steve wrote last month, add the comments from the last document we worked on and e-mail them out to the people in this spreadsheet. Make sure there is read receipt for everyone and schedule a conference call between them for the 24th.” What do you suppose the sequence of menu commands and clicks would look like for that?
We can improve other activities through voice technologies as well, such as browsing, searching and general UI. A conversational (discourse-driven) approach to
searching for documents would be particularly useful and rewarding. I don’t know what Google (GOOG) is up to in this area but with its unholy brain trust the company must be looking at this, especially if it’s starting to build out its rumored GPhone play. And take a look at the console games industry, which has used voice to make games more fun.
We also need a lot of other things, like common sense technologies and artificial intelligence, to make really good user experiences, but I’d encourage software manufacturers to start bringing in simple but useful speech elements to their user interfaces so we can start to bring them to the UI vocabulary of people out there. Just as long as they’re brought in with usefulness in mind, not as an afterthought.
And in an effort to practice what I preach, this article was dictated using DragonDictate (now known as Dragon NaturallySpeaking), which I can assure you is damned accurate, with only a few changes required. Without the acceleration that it provides me — especially with my two-fingered keyboard skills — neither this article nor the obscenely full e-mail inbox I have would ever get any attention. I only wish it could proofread for me.
Ah well, maybe one day.
{"source":"https:\/\/gigaom.com\/2007\/09\/18\/sit-up-straight-and-listen\/wijax\/49e8740702c6da9341d50357217fb629","varname":"wijax_7299463914745ea6df7f8bfa49feae48","title_element":"header","title_class":"widget-title","title_before":"%3Cheader%20class%3D%22widget-title%22%3E","title_after":"%3C%2Fheader%3E"}