Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are a class of Artificial Neural Networks that have applications in areas such as:

CNN Structure

A typical CNN uses a series of Convolution and Pooling layers followed by Activation Function layers and contain many hidden layers, referred to as Deep Learning.

Convolution

Convolution filters, also called Kernels, remove unwanted data. During the forward pass, each filter uses a convolution process across the filter input, computing the dot product between the entries of the filter and the input and producing an n-dimensional output of that filter. As a result, the network learns filters that activate when it detects some specific type of feature at some spatial position in the input.

See the Convolution page for details.

Pooling

Pooling:

down-samples matrix cell groups
preserves detected features

using functions that calculate values such as:

See the Pooling page for details.

Activation

Activation Functions Activation functions define the output of a graph node given a set of inputs, as illustrated below. Activation functions:

provide the capability to introduce non-linearity in order to model non-linear aspects of the real world
are monotonic - that is, they either constantly increase or decrease - this is important in neural network training to avoid chaotic behavior

See the Activation Functions page for details.

CNN Variations and Enhancements

CNN variations and enhancements include:

R-CNN - Regions with CNN Features; extracts separate image regions for CNN processing of each region
Fast R-CNN - CNN processing is done only once and a feature location map and pooling layer is used for final processing
Faster R-CNN - instead of using a selective search algorithm on the feature map to identify image regions, a separate network is used to predict the regions proposals
YOLO - You Only Look Once, is different than a region based CNN; only parts of the image with a high probability of containing an object are processed; a single CNN predicts bounding boxes and class probabilities for those boxes
Single Shot Detection - in SSD, object localization and classification are done in a single forward pass
MultiBox - technique for multiple object bounding box detection
Detector - CNN that detects and classifies objects in an image
Transpose Convolution - decompresses abstract representations into something of use
Unpooling/Deconvolution - runs the convolution process in reverse to produce an original size, smooth shaped version of the original image
Downsampling - reducing the output image size from a convolution pass using filters
Upsampling - is the reverse of pooling than increases the output image size from a convolution pass using filters
Region Proposal Network - an RPN quickly scans locations in an image to assess whether further processing needs to be performed in a given region
Object Detection with Object Recognition - combines capabilities such as OpenCV for object detection with TensorFlow for object recognition for images containing multiple objects

TensorFlow Inception CNN Models

The TensorFlow Inception model is a popular CNN which now has a number of versions available:

V1 - is the original Inception model using a deep learning CNN with aspects such as multiple sized filters and dimension reduction
V2 - changed node groupings to improve performance
V3 - included improvements such as altering stochastic gradient descent performance by limiting vertical oscillations using an RMSprop optimizer
V4 - included improvements such as reduction blocks to change the width and height of the network grid
ResNet - similar to V4; includes improved pooling and node optimizations