What is a Convolutional Neural Network (CNN)?

A neural network does advanced image analysis by mathematically breaking down properties of visual data.


What it is: A Convolutional Neural Network (CNN) is a neural network that uses a process called kernel convolution to break down a graphical image into its essential properties. By combining this with deep learning, CNNs can ‘understand’ image data, which leads to the possibility of making predictions about, or diagnostics of it. For example, a CNN can determine whether a given structure in a picture is a windmill, or make guesses about what year a portrait was taken.

What it does: It is difficult to analyze an image as millions of pixels exist in a high-resolution picture, thus using a normal deep learning algorithm on millions of inputs would take eons. However, CNNs bypass this difficulty with kernel convolution, a process based on that involved in image filters, such as blurs, sharpeners, and edge finders. By applying such convolutions repeatedly, CNNs break down images into hundreds of constituent properties–for example symmetrical aspects, amount of turquoise, or number of squares–and then run the deep learning algorithm on those properties.

Why it matters: CNNs have achieved previously unheard-of performance levels in the areas of image recognition and analysis. This capability has a variety of applications, including facial recognition, handwriting analysis, and automatic image captioning, can be accomplished with CNNs. Additional applications using non-image data are being explored.

What to do about it: If your enterprise does anything related to image analysis, such as facial recognition, CNNs could make a massive improvement on your existing solution. Also, consider that CNNs can do surprising things with image data–real estate listing app developers, for example, might take note of the ability of CNNs to estimate the price of a house, based on an image of it.


  • Makes detailed analysis of high-resolution images faster and more efficient
  • Can be applied to any data that can be expressed as an image
  • Can provide insight into physical systems by identifying important elements of a visual dataset


  • Advanced facial recognition
  • Automatic image captioning
  • Handwriting analysis
  • Document analysis

Potential Pitfalls

Although CNNs can achieve incredible results, they can be fooled by adversarial data, which is to say, images altered such that the network misclassifies them. This Quora answer from Google’s Nick R. Feller shows how a subtle layer of random noise can get a sophisticated CNN to confuse a panda for a gibbon. Some experiments in adversarial data include makeup/hairstyles designed to fool facial recognition networks.

Case Study

One novel application of CNNs is the addition of sound to silent videos. A team from MIT trained a neural network on footage of a drumstick hitting various objects, and then used the neural network to generate sound for similar footage that had been silenced. The resulting sounds were convincing and realistic. This result demonstrates the breadth of possible applications of CNNs.