The next time you’re watching a robot hand someone a cup of coffee, ask someone a simple question or even drive a car, do yourself a favor and don’t be such a critic.
Yes, a lot of what so-called intelligent or learning robots are doing is still fairly simple — some of it borders on mundane — but they’re not exactly working with a human brain. The fact of the matter is that machine learning is really hard; most artificial intelligence is, in fact, very engineered. Finding the right method of interaction between humans and robots might be even harder.
However, deep learning — the approach du jour among artificial intelligence researchers — might be just what the forthcoming robot doctor ordered to cure what ails our robots’ brains. Earlier this month, I spent a couple days at the Robotics: Science and Systems conference and was impressed by the amount of robotics research that seemingly could be addressed using the deep learning techniques made famous over the past couple years by Google, Facebook and Microsoft.
There were talks about nearly every aspect of robotic intelligence, from using a tool called “Tell Me Dave” (which we covered here) in order to crowdsource the process of training robot assistants to perform household tasks, to teaching robots to choose the best path from Point A to Point B. Researchers discussed applications for self-driving vehicles; from analyzing soil types to increase traction in off-road vehicles to learning latent features of geographical locations in order to recognize them in sunlight, darkness, rain or snow.
One of my favorite talks was about a robot dubbed the “Ikeabot” for its focus on helping to assemble furniture. The researchers working on it are trying to figure out the optimal process for communication between the robot and its human co-workers. As it turns out, that requires a lot more than just teaching the robot to understand what certain objects look like or how they fit into the assembly process. How the robot poses requests for help, for example, can affect the efficiency and workflow of human co-workers, and can even make them feel like they’re working along with the robot rather than just next to it.
The connective tissue throughout all these applications and all these attempts to make robots smarter in some way is data. Whatever the input — speech, vision or some sort of environmental sensor — robots rely on data in order to make the right decisions. The more and better data researchers have in order to train their artificial intelligence models and create algorithms, the smarter their robots get.
The good news: There’s a lot of good data available. The bad news: Training those models is hard.
Essentially, machine learning researchers often need to spend years worth of human-hours determining the attributes, or features, a model should focus on and writing code to turn those features into something a computer can understand. Training a computer vision system on thousands of images, just to create a robot (or an algorithm, really) that can recognize a chair, is a lot of work.
This is where new approaches to artificial intelligence, including deep learning, come into play. As we have explained on numerous occasions, there’s currently a lot of effort being put into systems that can teach themselves the features that matter in the data they’re ingesting. Writing these algorithms and tuning these systems is not easy (which is why experts in fields like deep learning are paid top dollar), but when they work they can help eliminate a lot of that tedious and time-consuming manual labor.
In fact, Andrew Ng said in a keynote at the Robotics event, deep learning (a field he says includes, but is not limited to, deep neural networks) is the best method he has found for soaking up and analyzing large amounts of data. Ng is best known for co-founding Coursera, heading up the Google Brain project in 2011 and teaching machine learning at Stanford. Most recently, he joined Chinese search engine giant Baidu as its chief scientist.
But Ng also knows a thing or two about robots. In fact, much of his research since joining the Stanford faculty in 2002 has been focused on applying machine learning to robots in order to make them walk, fly and see better. It was this work — or, rather, the limitations of it — that inspired him to devote so much of his time to researching deep learning.
“I came to the view that if I wanted to make progress in robotics, [I had] to spend all my time in deep learning,” he said.
What Ng has found is that deep learning is remarkably good at learning features from labeled datasets (e.g., pictures of objects properly labeled as what they are) but is also getting good at unsupervised learning, where systems learn concepts as they process large amounts of unlabeled data. This is what Ng and his Google Brain peers demonstrated with a famous 2012 paper about recognizing cats and human faces, and also what has powered a lot of advances in language understanding.
Naturally, he explained, these capabilities could help out a lot as we try to build robots that can better hear us, understand us and generally perceive the world around them. Ng showed an example of current Stanford research into AI systems in cars that distinguish between cars and trucks in real time, and highlighted the promise of GPUs to help move some heavy computational work into relatively small footprints.
And as the deep learning center of gravity shifts toward unsupervised learning, it might become even more helpful for roboticists. He spoke about a project he once worked on that aimed to teach a robot to recognize objects it might spot in the Stanford offices. That project included tracking down 50,000 images of coffee mugs on which to train the robot’s computer vision algorithms. It was good research and taught the researchers a lot, but the robot wasn’t always very accurate.
“For a lot of applications,” Ng explained, “we’re starting to run out of labeled data.” As researchers try scaling training datasets from 50,000 to millions in order to improve accuracy, Ng noted, “there really aren’t that many coffee mugs in the world.” If there are that many images, most of them won’t be labeled. Computers will need to learn the concept of coffee mugs by themselves and then be told what they’ve discovered, because no one can afford to spend the amount it would take to label them.
Besides, Ng added, most experts believe that human brains — still the world’s most-impressive computers, which have “very loosely inspired” deep learning techniques — learn largely in an unsupervised manner. No matter how good a parent you were, he joked, “you didn’t point out 50,000 coffee mugs to your children.”