Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are a class of Artificial Neural Networks that have applications in areas such as:

CNN Structure

A typical CNN uses a series of Convolution and Pooling layers followed by Activation Function layers and contain many hidden layers, referred to as Deep Learning.

Convolution

Convolution filters, also called Kernels, remove unwanted data. During the forward pass, each filter uses a convolution process across the filter input, computing the dot product between the entries of the filter and the input and producing an n-dimensional output of that filter. As a result, the network learns filters that activate when it detects some specific type of feature at some spatial position in the input.

See the Convolution page for details.

Pooling

Pooling:

using functions that calculate values such as:

See the Pooling page for details.

Activation

Activation Functions Activation functions define the output of a graph node given a set of inputs, as illustrated below. Activation functions: 

  • provide the capability to introduce non-linearity in order to model non-linear aspects of the real world

  • are monotonic - that is, they either constantly increase or decrease - this is important in neural network training to avoid chaotic behavior

See the Activation Functions page for details.

CNN Variations and Enhancements

CNN variations and enhancements include:

  • R-CNN - Regions with CNN Features; extracts separate image regions for CNN processing of each region

  • Fast R-CNN - CNN processing is done only once and a feature location map and pooling layer is used for final processing

  • Faster R-CNN - instead of using a selective search algorithm on the feature map to identify image regions, a separate network is used to predict the regions proposals

  • YOLO - You Only Look Once, is different than a region based CNN; only parts of the image with a high probability of containing an object are processed; a single CNN predicts bounding boxes and class probabilities for those boxes

  • Single Shot Detection - in SSD, object localization and classification are done in a single forward pass

  • MultiBox - technique for multiple object bounding box detection

  • Detector - CNN that detects and classifies objects in an image

  • Transpose Convolution - decompresses abstract representations into something of use

  • Unpooling/Deconvolution - runs the convolution process in reverse to produce an original size, smooth shaped version of the original image

  • Downsampling - reducing the output image size from a convolution pass using filters

  • Upsampling - is the reverse of pooling than increases the output image size from a convolution pass using filters

  • Region Proposal Network - an RPN quickly scans locations in an image to assess whether further processing needs to be performed in a given region

  • Object Detection with Object Recognition - combines capabilities such as OpenCV for object detection with TensorFlow for object recognition for images containing multiple objects

TensorFlow Inception CNN Models

The TensorFlow Inception model is a popular CNN which now has a number of versions available:

  • V1 - is the original Inception model using a deep learning CNN with aspects such as multiple sized filters and dimension reduction

  • V2 - changed node groupings to improve performance

  • V3 - included improvements such as altering stochastic gradient descent performance by limiting vertical oscillations using an RMSprop optimizer

  • V4 - included improvements such as reduction blocks to change the width and height of the network grid

  • ResNet - similar to V4; includes improved pooling and node optimizations

References