It takes an ecosystem: Building natural UIs isn’t just about sensors and software.

eyesight-gesture-tablet

The launch of Aquifi, a startup offering a gesture-based UI, plus a weekend programming my connected home has left me thinking about the wall technologists are crashing into when it comes to consumer expectations of next-generation user interfaces and the current limitations of computer intelligence.

As we expect our devices to become more contextually aware and our interactions less constrained by a screen, the binary nature of computer-human interactions is become increasingly fraught. In my home I encounter it as I try to program my home’s Awake mode to change only when two specific sensors are triggered (just triggering one might mean I’m just getting up in the middle of the night) and only during a certain time of day, on days that aren’t weekends or holidays. It’s beyond stupid to think this is going to work.

And yet, when I figure something out, and it does come together, it’s amazing. In reading about Aquifi, the new Benchmark-backed startup that hopes to refine gesture-controls into something a bit more casual using low-cost image sensors already on devices, the parallels between the experience of trying to convey gesture-based instructions to a device and program my home to act on my behalf were striking.

Aquifi is using visual recognition of a person’s face as a means of identification and the position of the face as a signal for the user’s intent. When you look at a device running the Aquifi software, it should pay attention, much like you tell Google’s Glass product “Okay Glass” to wake it up. The software also purportedly can adapt its response based upon machine learning to better respond to the people/person controlling it.

Much like my home is becoming a testing ground for more sensors and triggers, always-on imaging sensors are becoming an important aspect in phones, tablets and even connected home devices outside of cameras. Companies such as ArcSoft, Rambus and others are trying to bring new innovations in the software and hardware side of the business to make smart computer vision and gesture-based interactions more natural.

But if a phone or device is always waiting for a visual cue to take some action, it becomes really important to build a vocabulary of gestures that devices can recognize but also signal to the phone that it’s now time to pay attention. Aquifi’s release indicates that facial recognition and understanding where a person is looking will be part of that signaling in its software.

My hunch is that while one software package might handle the vocabulary associated with gestures or the actions my home will take, intent and discovering context will likely come from bringing in a variety of clues — from calendar entries to voice cues. So in my home, perhaps instead of me looking up all the holidays to exclude them manually from my Awake mode, I just link it to my calendar which already has that information. Google Research just actually posted a video of this in action ,where a sensor on a smartwatch communicates with a phone to convey more detailed information about the gesture the person it making.

[youtube http://www.youtube.com/watch?v=ZKGsb2F9dms&w=853&h=480]

Additional context from sensors or other services might also help narrow down the situational intelligence (and processing power) software like Aquifi’s might require. So maybe my looking at a device won’t be enough to trigger it, but adding in a voice cue or a haptic clue will. Or perhaps, another app or service I’m using reports to the image sensor that I’m in the middle of cooking or jogging, which then allows the Aquifi software to narrow down its available vocabulary of gestures to ones that are likely in those situations.

Aquifi was co-founded by CEO Nazim Kareemi, who also co-founded Canesta, a gesture-based TV interface. The Palo Alto-based startup has some big names and years of experience solving gesture-related problems in its ranks. However, as we want more natural UIs without the power draw and processing power of a supercomputer, I think context clues provided by an ecosystem of other apps and services will be necessary. So no matter how great your engineers are internally, your products will likely have to gain knowledge from the broader world.

loading

Comments have been disabled for this post