1 Comment

Summary:

Building more powerful user interfaces isn’t just about software, such as the newly launched product from Aquifi, but about providing more context. This will make UIs better and programming harder.

eyesight-gesture-tablet

The launch of Aquifi, a startup offering a gesture-based UI, plus a weekend programming my connected home has left me thinking about the wall technologists are crashing into when it comes to consumer expectations of next-generation user interfaces and the current limitations of computer intelligence.

As we expect our devices to become more contextually aware and our interactions less constrained by a screen, the binary nature of computer-human interactions is become increasingly fraught. In my home I encounter it as I try to program my home’s Awake mode to change only when two specific sensors are triggered (just triggering one might mean I’m just getting up in the middle of the night) and only during a certain time of day, on days that aren’t weekends or holidays. It’s beyond stupid to think this is going to work.

And yet, when I figure something out, and it does come together, it’s amazing. In reading about Aquifi, the new Benchmark-backed startup that hopes to refine gesture-controls into something a bit more casual using low-cost image sensors already on devices, the parallels between the experience of trying to convey gesture-based instructions to a device and program my home to act on my behalf were striking.

Aquifi is using visual recognition of a person’s face as a means of identification and the position of the face as a signal for the user’s intent. When you look at a device running the Aquifi software, it should pay attention, much like you tell Google’s Glass product “Okay Glass” to wake it up. The software also purportedly can adapt its response based upon machine learning to better respond to the people/person controlling it.

Much like my home is becoming a testing ground for more sensors and triggers, always-on imaging sensors are becoming an important aspect in phones, tablets and even connected home devices outside of cameras. Companies such as ArcSoft, Rambus and others are trying to bring new innovations in the software and hardware side of the business to make smart computer vision and gesture-based interactions more natural.

But if a phone or device is always waiting for a visual cue to take some action, it becomes really important to build a vocabulary of gestures that devices can recognize but also signal to the phone that it’s now time to pay attention. Aquifi’s release indicates that facial recognition and understanding where a person is looking will be part of that signaling in its software.

My hunch is that while one software package might handle the vocabulary associated with gestures or the actions my home will take, intent and discovering context will likely come from bringing in a variety of clues — from calendar entries to voice cues. So in my home, perhaps instead of me looking up all the holidays to exclude them manually from my Awake mode, I just link it to my calendar which already has that information. Google Research just actually posted a video of this in action ,where a sensor on a smartwatch communicates with a phone to convey more detailed information about the gesture the person it making.

Additional context from sensors or other services might also help narrow down the situational intelligence (and processing power) software like Aquifi’s might require. So maybe my looking at a device won’t be enough to trigger it, but adding in a voice cue or a haptic clue will. Or perhaps, another app or service I’m using reports to the image sensor that I’m in the middle of cooking or jogging, which then allows the Aquifi software to narrow down its available vocabulary of gestures to ones that are likely in those situations.

Aquifi was co-founded by CEO Nazim Kareemi, who also co-founded Canesta, a gesture-based TV interface. The Palo Alto-based startup has some big names and years of experience solving gesture-related problems in its ranks. However, as we want more natural UIs without the power draw and processing power of a supercomputer, I think context clues provided by an ecosystem of other apps and services will be necessary. So no matter how great your engineers are internally, your products will likely have to gain knowledge from the broader world.

  1. I cannot make up my mind about gesture. I played with a Kinect a few years back and then there was leap motion and recently a cheaper interface on the cover of Elektor Magazine (http://www.elektor-magazine.com/en/magazine-contents/elektor-052014.html). The tone of the debate is a solution looking for a problem.

    My feeling is that if you are having a dialog with something that only has the intellgence of a snail then a simple switch (or array of such as in a remote control) is perfectly fine. The small problem with the latter is that occasionally it gets misplaced but like all lost things (apart from socks) eventually turns up so that there is no need to really resort to the hallowed universal remote panacea.

    On the other hand there is this sneaky feeling that there ought to be some scenario that needs a bit more interface gimcrackery than currently available. I always remember my first Automated House tour at THORN EMI in Hayes. The split screen master controls showed some(one) changing channels on the TV above the bath, was the most popular feature. Contrast this with the recent lurid UK interest of a hacker being able to remotely control a baby camera monitor and you are left wondering what is the system where the constraints produce a novel set of requirements. Recent interest has been in technology for the aged and infirm but much time will tell if my Dad’s technophobia is any measure of that market segment.

    Share

Comments have been disabled for this post