In our quest to build robots with artificial intelligence, we face five problems. We covered the first one, which is seeing. The second one, which is contextualizing. Even if the robot could look at something and decide what it is, that doesn’t actually get us very much further because life is not a series of still photos we are recognizing–it is kinetic, it moves. Context is derived from the differences in a series of still images, and there are many, many possible variations of that. There are few training sets available that have that kind of information.
When you see your young neighbors, a husband and his very pregnant wife, hurrying towards the car with her holding her belly and him carrying an overnight bag looking worried, you don’t have any trouble figuring out what’s going on, but that’s hard for a computer. Now let’s take it one step further and say you don’t see them rush off. And now, it’s been two days and you notice that there are newspapers in their yard, and their car’s gone. You instantly have a good idea what happened, and you don’t even have to think that hard about it. You just casually remark to your spouse, “I think the neighbors must be having their baby.” If another neighbor has a son who is about to turn sixteen, and out of nowhere he starts going door-to-door asking to mow yards, you might infer he wants to buy a car. But training a computer to make these kind of intuitive leaps is well beyond us.