Microsoft says its new computer vision system can outperform humans

6 Comments

Microsoft researchers claim in a recently published paper that they have developed the first computer system capable of outperforming humans on a popular benchmark. While it’s estimated that humans can classify images in the ImageNet dataset with an error rate of 5.1 percent, Microsoft’s team said its deep-learning-based system achieved an error rate of only 4.94 percent.

Their paper was published less than a month after Baidu published a paper touting its record-setting system, which it claimed achieved an error rate of 5.98 percent using a homemade supercomputing architecture. The best performance in the actual ImageNet competition so far belongs to a team of Google researchers, who in the 2014 built a deep learning system with a 6.66 percent error rate.

A set of images that the Microsoft system classified correctly.

A set of images that the Microsoft system classified correctly. “GT” means ground truth; below are the top five predictions of the deep learning system.

“To our knowledge, our result is the first published instance of surpassing humans on this visual recognition challenge,” the paper states. “On the negative side, our algorithm still makes mistakes in cases that are not difficult for humans, especially for those requiring context understanding or high-level knowledge…

“While our algorithm produces a superior result on this particular dataset, this does not indicate that machine vision outperforms human vision on object recognition in general . . . Nevertheless, we believe our results show the tremendous potential of machine algorithms to match human-level performance for many visual recognition tasks.”

A set of images where the deep learning system didn't match the given label, although it did correctly classify objects in the scene.

A set of images where the deep learning system didn’t match the given label, although it did correctly classify objects in the scene.

One of the Microsoft researchers, Jian Sun, explains the difference in plainer English in a Microsoft blog post: “Humans have no trouble distinguishing between a sheep and a cow. But computers are not perfect with these simple tasks. However, when it comes to distinguishing between different breeds of sheep, this is where computers outperform humans. The computer can be trained to look at the detail, texture, shape and context of the image and see distinctions that can’t be observed by humans.”

If you’re interested in learning how deep learning works, why it’s such a hot area right now and how it’s being applied commercially, think about attending our Structure Data conference, which takes place March 18 and 19 in New York. Speakers include deep learning and machine learning experts from Facebook, Yahoo, Microsoft, Spotify, Hampton Creek, Stanford and NASA, as well as startups Blue River Technology, Enlitic, MetaMind and TeraDeep.

We’ll dive even deeper into artificial intelligence at our Structure Intelligence conference (Sept. 22 and 23 in San Francisco), where early confirmed speakers come from Baidu, Microsoft, Numenta and NASA.

6 Comments

B. Rabbit

Like watching those TV ads for TVs with awesome resolution… on your crappy TV. This stuff is a trap for anyone who can’t think outside the box or lacks a healthy dose of critical thinking. Even Elon Musk and Gigaom seems to have fallen for the trap.

Explanation for those inside the box: you’re looking at images with a resolution supposedly beyond human capabilities, with your limited eyes, and on a screen that may or may not have the highest resolution. So why am I not impressed?

Pratik Mehta

Isn’t that the point of all AI research? Taking advantage of that difference and being able to do a task better than we can do manually.. It matters greatly in so many ways, like robots being able to truly ‘see’ instead of only using standard sensors with limited capability.

Alexander

“However, when it comes to distinguishing between different breeds of sheep, this is where computers outperform humans.”
Not true and totally wrong. Any shepherd outperform MS with easy. It’s wrong to expect “average” human to distinguish between different breeds whilst MS keep these details in database.

Derrick Harris

But can a shepherd also distinguish between different breeds of dogs, instruments, food, flowers, etc.? That’s kind of the point.

Trylks

Do you know whether training and testing sets are disjoint?

This could be just a race for overfitting. A complex one, but overfitting after all.

Comments are closed.