Artificial Neural Networks

Artificial Neural Networks (ANN) are computing system graphs roughly modeled on biological neural networks that constitute animal brains.

  • Input Nodes - take in feature data used for model training and prediction processing

  • Hidden Nodes - are in between input and output nodes and take in data and use processes such as activation functions to produce outputs that are sent to other nodes

  • Output Nodes - receive the result of the neural processing of activation nodes

  • Data Array Links - connect and pass data between nodes

  • Weights - are adjusted by the model training process to modify data array links to produce increasingly more accurate output results; weights are used by the Activation Functions to modify data input values

The diagram below shows conceptual ANN high level components. There are many variations on ANN architecture such as recurrence and convolution.

Key aspects of ANNs include:

  • Flexibility - ANNs can be configured to address a wide variety of Machine Learning applications

  • Accuracy - in many applications, ANNs have achieved accuracies exceeding that of humans

  • Advancements - advancements in ANN technology over the past few years have made them the most widely used Machine Learning algorithm

Model Training

Data is iteratively processed through the neural network while adjusting the weights and biases applied to data array links using backpropagation to produced increasingly more accurate output results.

  • Data Inputs - data is fed into the training process

  • Iteration - data is iteratively passed through the neural network

  • Forward Propagation - data is passed from node to node

  • Outputs - output results are fed into loss calculations

  • Loss Calculation - the difference between output results and desired results is calculated

  • Weight Optimization - the amount of change to data flow weights is calculated

  • Backpropagation - modifies the weights and biases applied to data array links

Prediction Processing

Data is passed forward through the neural network to produce a result and associated confidence level that the result is true.

  • Data Inputs - data is fed into the training process

  • Forward Propagation - data is passed from node to node

  • Outputs - output results are fed into confidence level calculations

  • Confidence Level - is a number from 0 to 1 indicating the probability that the output results is correct

Processing Enhancements

Processing enhancements include methods such as:

  • Batch Normalization - normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation

  • Batch Gradient Descent - averages the gradients of training examples and uses the mean to update parameters

Python Example

This code example uses a number of hyperparameters to control aspects of model instantiation and training. A selection of the main hyperparameters are shown below. For more information on the processing functions used and additional hyperparameters see:

  • activation_function: which Activation Function is used in Activation Nodes

  • batch_size: the number of inputs to include in each processing iteration linked to the learning rate

  • hidden_network_layers: the number of nodes in each hidden network layer; hidden layers are those between the input and output layers

  • learning_rate: what algorithm to use for controlling Weight Optimization

  • maximum_number_of_iterations: the maximum number of iterations of data is processed through the neural network

  • number_of_data_features: the number of data features used for model training and inference processing

  • number_of_informative_data_features: the number of data features correlated to the training outputs; this simulates real world model training where the correlation of data features may not be known

  • number_of_model_classes: the number of output classes the neural network is being trained to predict

  • number_of_prediction_tests: the number of prediction tests included in the example code

  • number_of_training_and_test_samples: the number of data samples processed through model training

  • print_training_progress: whether to print the loss after each training iteration; loss is a measure of the difference between calculated outputs and expected outputs

  • tolerance_for_optimization: a numeric value used for ending the model training iteration cycles

  • weight_optimization_algorithm: the algorithm used for Weight Optimization, such as Stochastic Gradient Descent

To download the code below, click here.

"""
neural_network_with_scikit_learn.py
creates, trains and tests an artificial neural network

With the parameter values set as they are,
running the code may take as much as a few minutes to finish.
To reduce the running time, reduce the parameter value for:
number_of_training_and_test_samples
"""
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Set parameters.
number_of_training_and_test_samples = 10000
number_of_data_features = 60
batch_size = min(1000, number_of_training_and_test_samples)
number_of_informative_data_features = 50
number_of_model_classes = 8
number_of_prediction_tests = 30
activation_function = 'relu'
hidden_network_layers = (50, 50, 50)
weight_optimization_algorithm = 'sgd'
learning_rate = 'adaptive'
tolerance_for_optimization = 1e-5
maximum_number_of_iterations = 10000
random_state = 1
print_training_progress = True

# Generate model training and test data.
X, y = make_classification(n_samples=number_of_training_and_test_samples,
                           n_features=number_of_data_features,
                           n_informative=number_of_informative_data_features,
                           n_classes=number_of_model_classes,
                           random_state=random_state)

# Split the classification data into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    stratify=y,
                                                    random_state=random_state)

# Instantiate a neural network classifier.
classifier = MLPClassifier(random_state=random_state,
                           hidden_layer_sizes=hidden_network_layers,
                           batch_size=batch_size,
                           activation=activation_function,
                           solver=weight_optimization_algorithm,
                           learning_rate=learning_rate,
                           tol=tolerance_for_optimization,
                           max_iter=maximum_number_of_iterations,
                           verbose=print_training_progress)

# Train the classifier.
trained_classifier = classifier.fit(X_train, y_train)

# Get the trained model mean accuracy score using test data.
mean_accuracy = trained_classifier.score(X_test, y_test)
print"Mean Accuracy of All Test Predictions:"
print(mean_accuracy)

# Process test predictions.
test_predictions = trained_classifier.predict(X_test[:number_of_prediction_tests, :])
print("Actual Prediction Test Classes:")
print(y_test[:number_of_prediction_tests])
print("Predicted Test Classes:")
print(test_predictions)

The example output is below:

Iteration 1, loss = 3.12814482
Iteration 2, loss = 2.67163940
Iteration 3, loss = 2.45638738
Iteration 4, loss = 2.34068097
Iteration 5, loss = 2.26461961
Iteration 6, loss = 2.21042757
Iteration 7, loss = 2.16944587
Iteration 8, loss = 2.13762949
Iteration 9, loss = 2.11139347
Iteration 10, loss = 2.08858667
Iteration 11, loss = 2.06853117
.
.
.
Iteration 3104, loss = 0.01097677
Iteration 3105, loss = 0.01097640
Iteration 3106, loss = 0.01097621
Training loss did not improve more than tol=0.000010 for two consecutive epochs. Setting learning rate to 0.000002
Iteration 3107, loss = 0.01097576
Iteration 3108, loss = 0.01097568
Iteration 3109, loss = 0.01097563
Training loss did not improve more than tol=0.000010 for two consecutive epochs. Setting learning rate to 0.000000
Iteration 3110, loss = 0.01097555
Iteration 3111, loss = 0.01097553
Iteration 3112, loss = 0.01097552
Training loss did not improve more than tol=0.000010 for two consecutive epochs. Learning rate too small. Stopping.
Mean Accuracy of All Test Predictions:
0.6672
Actual Prediction Test Classes:
[0 4 3 7 5 5 4 0 0 6 2 0 6 1 6 3 4 0 2 2 4 0 2 4 5 3 0 2 3 5]
Predicted Test Classes:
[0 4 3 7 6 5 4 0 0 6 2 0 6 1 6 3 4 0 2 2 7 5 2 4 0 3 0 2 3 5]