Cut out the middleman

In a data coup, Apical analyzes visual data without the video

Apical, a company best known for the years of work it has spent contributing imaging tech to camera lenses inside smart phones and security cameras, has now devised a computer vision program for the smart home and business. The company calls its innovations Spirit and ART (short for Apical Residential Technology), and together they are probably the most disruptive things I’ve seen in terms of deriving context inside the home and processing data for the internet of things.

The reason is because Spirit and ART don’t use video. They takes visual data as seen by camera lenses, but don’t turn that into video for human eyes. Instead, they processes the visual data into computer-usable avatars that represent the people in a home. By doing this on the device that contains the camera lens and Spirit processor running the ART software, it can open up several new features in the smart home without freaking people out that videos of their naked bits will somehow get on the internet.

Because it only transmits the information the computer needs to identify a person and their gesture as opposed to all of the background pixels and filler, the system also saves bandwidth by reducing the data load associated with video files. Plus, it’s much faster for a computer to parse machine readable data as opposed to visual data that’s fit for human consumption, like most camera footage is. Thus, the system can react much more quickly to movements that people make in the home. For example, Apical CEO Michael Tusch says that the ART software can discern people from pets, adults from children and even people who live in a home from strangers.

The ART software is made possible because of the Spirit technology, which is implemented on a piece of silicon on a sensor in the home. That silicon is capable of performing object recognition and machine learning algorithms that researchers are currently using today in helping computers learn to “see. Because it does this on chip, it can compress large incoming data streams from 5 gigabits per second to a few kilobits per second depending on the output required.

More use cases

This means the system could help solve the tedious problem of determining if people in the home have really gone away or not. For example, my Nest tries to tell various devices in my house if I’m away, but it sometimes figures that out based on whether or not I’ve walked near the Nest thermostat in my upstairs hallway anytime recently. That’s not always an accurate measure of who’s home. Pinning my away status on my or my husband’s handset is equally problematic because when we leave the house and our daughter is home with a sitter, we’ll get a call telling us that the lights are suddenly off and the alarm has turned on.

In addition to being able to accurately track how many people are home without taking video of the home, the Spirit ART system could offer some other compelling use cases. For example, it could let you know if strangers are at the door, or notice if one of the avatars representing a person inside your home suddenly moved in an unusual fashion, which might indicate a fall. That would be invaluable if you are monitoring an elderly person but you don’t really want to spy on their every move.

Of course, machine data can be just as telling as visual data, which means that the cloak of privacy that the system provides is mostly just a guarantee that your actual naked pictures don’t end up online. If your employer installed a system in the office, it’s just as effective at monitoring your comings and goings as a traditional camera and maybe even more so, since it can parse data far more effectively that a security guard watching several screens of video feeds day in and day out. Computers don’t rest their eyes or take smoke breaks.

Looking ahead

That also brings up the other really disruptive aspect of this system. Right now, much of the research around computer vision focuses on teaching computers to see like humans do. To take videos and pictures of cats and teach computers what features constitute a cat. This approach is different. It takes the visual world that computers see and tells the computers how to act when they sees items matching certain patterns. If this approach scales, then it could solve a problem that plagues the internet of things and the modern surveillance society in general.

We have far more video being created than we could ever watch or even use, and as we bring cities online and things like self-driving cars into the picture (ha ha) we’re adding the flow of visual information in ways that computers can’t process fast enough. In fact, Bernie Meyerson, a vice president of innovation and IBM Fellow, complained about this very issue with me a few years ago on a podcast, when discussing the internet of things and smarter cities.

One of the big challenges he foresaw with the data being created by cameras located around cities like London was that people cannot parse all of that visual information and neither can computers. But with a system like Apical’s, if it scales, cities could eliminate some of the video and have a system that computers can read. Ironically, you also create a system people are far more likely to read as less privacy invading, while creating one that can actually parse far more data, far more quickly. Perhaps we can hope that it would be used more for helping predict crowd traffic flows, catching actual criminals and for social good as opposed to for casual surveillance.

For now, Apical is marketing this technology for the smart home, and hopes to license it to companies that would implement it into their own devices, much like big-name firms such as Samsung and Polycom already license imaging technology from Apical for their products. When it comes to creating compelling user interfaces that rely on more contextually aware computers in the home this technology is a big winner both in the granularity it can provide and the privacy it offers. However, like all technology, it is a tool that can be used for good or for some really invasive and scary stuff absent some rules to prevent its abuse.

If you’re interested in learning how deep learning works, why it’s such a hot area right now and how it’s being applied commercially, think about attending our Structure Data conference, which takes place March 18 and 19 in New York. Speakers include deep learning and machine learning experts from Facebook, Yahoo, Microsoft, Spotify, Hampton Creek, Stanford and NASA, as well as startups Blue River Technology, Enlitic, MetaMind and TeraDeep.

Update: This story was updated March 5 to clarify that ARM is not not an Apical licensee. It is a technology partner.

2 Responses to “In a data coup, Apical analyzes visual data without the video”

  1. markjnorton

    “Because it only transmits zeros and ones, the system also saves bandwidth by reducing the data load associated with video files.”

    If you give this a little thought, you’d realize how silly this sounds. ALL video is transmitted over a network is in binary code. While video could be transmitted in analog form, that’s not how digital video works. Perhaps you meant that the system saves bandwidth vs. analog video – which it does, but this is not a new thing.

    Aside from that, I would agree that this is an innovate product with lots of potential. Doing scene object extraction “in sillico” should increase utility and improve privacy. I applaud all efforts to keep and maintain privacy as the Internet of Things juggernaut rolls along.