Is Google scared of Siri? Is Yelp? Is Facebook? If they aren’t they should be, as should any mobile website, service or app that depends on advertising for revenues. Siri is just the beginning of a new wave of user interfaces (UIs) that will gradually shift our attention away from our phones’ screens, allowing us to interact with our devices in ways that don’t involve tapping keys and staring at pixels.
These technologies will have a considerable impact on mobile advertising business models, which depend on consumers eyes being focused on their screens to work, said Vlad Sejnoha, CTO of Nuance, the voice recognition and artificial intelligence (AI) company behind Siri. Apple’s new voice assistant hasn’t exactly eliminated the need for visual interaction with a smartphone, Sejnoha said, but its effects are already being felt on search portals, which traditionally act like as the middleman between consumers and the information they seek.
The most oft-cited example is “Siri, call me a cab.” Rather than perform the usual local search, displaying a list of taxi companies and word ads on a screen, Siri does all of the dirty work in the background, automatically placing a call to what its AI feels is the most relevant dispatcher. That bypass of the search portal is changing the way that cab companies present themselves on the Web, spawning a new type of SEO: ‘Siri’ engine optimization. Being in the top three listings of a local Google search or depending on AdWords to push your website to the top of sponsored results is no longer good enough, if Siri is doing the searching instead of the consumer.
Nuance’s own Dragon Go iOS app makes a similar end run around search portal. Dragon Go ingests a spoken search term such as “new Mexican restaurants” or “show times for Sherlock Holmes” and spits out websites displayed in a carousel, which the user can then thumb through. The websites themselves are still served up and thus are the ad impressions. But the portal middleman is eliminated, at least from the consumer’s perspective – Google’s engine may be used to power that search, but its customer-facing website or app never comes into the equation.
“Speech and natural-language search adds a new element to the UI,” Sejnoha said. “It can skip over steps you’d usually take to get information. It’s a shortcut between a user’s intent and a positive outcome. That’s a powerful concept.”
You ain’t seen, heard or felt nuthin’ yet
While voice assistants like Siri may seem revolutionary today, they’re actually rather simplistic compared to what UIs will be capable of as artificial intelligence and new multimodal means of interaction develop. Phones are already full of sensors that could be used to input information, Sejnoha said. If you don’t like the answer your phone is giving you, a simple shake of the handset could tell it to find you a better one. Rather than speak or type a search term into a user interface, you could point your phone’s camera at a bus stop sign, a restaurant or even a magazine ad, and within seconds have the relevant bus schedule, menu or product information displayed on your screen or dictated back to you.
The phone’s accelerometers could be used to power gesture-based commands — a flick to the right means you want to make a phone call, while a flick to the left initiates a Web search. Micro-projectors could even displace information you would normally need to consume from the screen. For example, instead of displaying a tiny map through LEDs and glass, the phone can paint an arrow on the ground showing you the direction you need to head.
As natural language interpretation and artificial intelligence technologies improve, the phone begins to comprehend subtleties in meaning and interpret intent, Sejnoha said. Nuance is working with IBM’s Watson program on deep question understanding and natural language processing. The hope for both companies is to create sophisticated network-based artificial intelligence that can power semantic searches.
Say you’re a hungry pedestrian walking the streets of New York and suddenly spot a new restaurant. You then make a gesture with your device to activate its voice assistant, point it at the restaurant and ask: “What rating did the New York Times give this joint?” GPS knows your precise location. The digital compass knows which direction you’re pointing. The UI then pings its AI servers in the cloud, which not only determine that you’re interested in a restaurant, but are specifically requesting the number of stars awarded it by a specific publication. The AI then scrapes that data off of NYT’s website and returns with a simple answer audio answer: “3 stars.”
In that scenario, the UI is already skipping over several websites and applications that a normal search would have used to find that 3-star rating: mapping apps, search engines and the New York Times website. But what if that phone delivered more than just the basic data point requested? What if it correctly inferred you were looking to nosh and wanted help selecting a restaurant? It could search out other restaurant reviews from sites it knows you trust. It could scour the restaurant’s menu for dietary information. And it could poll your social networks to see if any of your friends had eaten there. The AI then contextualizes all of that info, decides what’s most relevant to you based on preferences and profile information, and spits out this answer:
Three stars, but Yelp gives it three and one half. Bob loves it. He says try the smoked-tea duck. Tell the waiter about your peanut allergy, though. The kitchen stir-fries with groundnut oil.
With that one semantic search, your device captures information from dozens of Websites, but by performing that search with a multimodal interface, at no point do you have to look at the phone’s screen. That also means at no point do you have to look at an ad. “Along the way value is derived from those brief stays on the portals through ad impressions,” Sejnoha said. “If you’re getting from A to B in fewer steps, you’re bypassing those tolls.”
What’s a Google to do?
These new interfaces put a lot of power into the hands of the UI creator, making them the new gatekeepers for information on the Web. That’s a huge benefit for big web brands that can strike up partnerships with the interface developer. For instance, Nuance works with Milo to deliver online-shopping results and Fandango for movie show times on its Dragon Go app. But since the focus of such interfaces is to deliver a single result rather than a variety of results, Internet companies without the desire or resources to partner with Apple or Nuance may find themselves marginalized.
Less choice may be exactly what consumers want, especially in the mobile world where people are using their phones to grab snippets of timely information rather conduct extensive research. According to Technology Business Research senior Analyst Ezra Gottheil, implementations of Siri are fixing what many consider a critical flaw in the user interface: the huge number of options a user is faced with on the average smartphone. By simplifying the UI with voice commands that yield immediate access to results, consumers are likely to use searches and applications more frequently, Gottheil said in a TBR research note.
As for the search portals, if Google or Bing don’t want to get cut out of the value chain, the easy answer is to implement multimodal search interfaces of their own. Both engines make powerful use artificial intelligence to refine their searches, while Google and Microsoft already accepts voice-prompted search terms. It wouldn’t be hard for them to project an ‘optimal’ search result beyond the confines of the screen – a voice activated and delivered version of Google’s “I’m feeling lucky” button.
But if Google and Bing are delivering direct results, rather than a list of links, what of their keyword advertising based business models? And what of the websites themselves that find their information hijacked by a multimodal UI without even an ad impression in compensation? Sejnoha thinks they just need to be more creative with their business models.
The screen is never going away, Sejnoha said. There are certain types of information that people will always need to absorb visually. It’s much easier to look at pie chart than have it described to you. And unless you’re driving, it’s easier to read a restaurant review rather than have your phone dictate it to you. But in the cases where information is more handily delivered off screen, companies need to find ways to draw their customers’ attention back to their devices, Sejnoha said.
Imagine in the restaurant example above that after your phone delivered its personalized and contextualized information about the restaurant in question, it added the following addendum: “By the way, I have some coupons for other restaurants in the neighborhood, if you’d care to look at my screen…”
Image courtesy Flickr user Lazurite