Gradient Boosting

Gradient Boosting is a Machine Learning result improvement methodology with these characteristics:

The objective is to improve prediction results, that is, to reduce the error gradient ... that is, to boost the results.
Gradient boosting uses an ensemble of models to achieve its result.
Starts with a weak learner, usually a decision tree.
Each model is created sequentially based on the results of the previous model.
Each new model is trained to predict the residuals of the ensemble of previous models rather than the original target values. This means each model is focused on correcting the "mistakes" made so far.
Stochastic gradient descent can be used to minimize the loss when adding models.
Each model’s predictions are scaled by a factor called the learning rate (or shrinkage rate). The learning rate controls how much each new model’s predictions impact the overall ensemble.
The final prediction of the ensemble is the sum of all individual model predictions, each weighted by the learning rate. This aggregated prediction is what the model uses to predict new instances.

Python Example

To see details of the GradientBoostingRegressor model, click here.

To download the code below, click here.

"""
gradient_boosting_using_scikit_learn.py
uses a GradientBoostingRegressor model to demonstrate gradient boosting
"""

# Import needed libraries
import numpy as np
import matplotlib.pyplot as plotlib
from sklearn import ensemble
from sklearn import datasets
from sklearn.utils import shuffle
from sklearn.metrics import mean_squared_error

# Define the number of boosting stages.
# Gradient boosting is fairly robust to over-fitting,
# so a larger number can result in better performance.
number_of_estimators = 500

# Define the maximum number of tree nodes.
maximum_depth = 4

# Define the minimum number of samples required to split an internal node.
minimum_sample_split = 2

# Define the learning rate.
# Learning rate shrinks the contribution of each tree by learning_rate.
# There is a trade-off between learning_rate and n_estimators.
learning_rate = 0.01

# Define the loss function to be optimized.
# In this example, use the least squares regression.
loss_function = 'ls'

# Define an offset for training and test data.
# Needs to be greater than zero and less than one.
offset_value = 0.4

# Define a random state value for shuffling data into X and y values.
random_state_value = 13

# Load a test dataset.
dataset = datasets.load_diabetes()

# Shuffle and separate data into X and y values.
X, y = shuffle(dataset.data, dataset.target, random_state=random_state_value)
print("X:")
print(X)
print("y:")
print(y)

# Convert the X data to floating point.
X = X.astype(np.float32)

# Define an offset based on the offset_value parameter.
offset = int(X.shape[0] * offset_value)
print("offset:")
print(offset)

# Create training and test data.
X_train, y_train = X[:offset], y[:offset]
print("X_train:")
print(X_train)
X_test, y_test = X[offset:], y[offset:]
print("X_test:")
print(X_test)

# Instantiate a gradient boosting regression model.
model = ensemble.GradientBoostingRegressor(n_estimators=number_of_estimators,
                                           max_depth=maximum_depth,
                                           min_samples_split=minimum_sample_split,
                                           learning_rate=learning_rate,
                                           loss=loss_function)

# Train the model.
model.fit(X_train, y_train)

# Predict values based on X_test data.
X_predictions = model.predict(X_test)

# Calculate the mean squared error of X_predictions.
mean_squared_error_value = mean_squared_error(y_test, X_predictions)
print("X_prediction mean_squared_error:")
print(mean_squared_error_value)

# Get the training set deviance (loss) score.
train_score = model.train_score_

# Get the test set deviance (loss) score.
test_score = np.zeros((number_of_estimators,), dtype=np.float64)
for prediction, y_pred in enumerate(model.staged_predict(X_test)):
    test_score[prediction] = model.loss_(y_test, y_pred)

# Plot the deviance score curves.
plotlib.figure(figsize=(24, 6))
plotlib.plot(np.arange(number_of_estimators) + 1,
             train_score, 'b-',
             label='Training Set Deviance (Loss)')
plotlib.plot(np.arange(number_of_estimators) + 1,
             test_score, 'r-',
             label='Test Set Deviance (Loss)')
plotlib.legend(loc='upper right')
plotlib.xlabel('Gradient Boosting Iteration Stages')
plotlib.ylabel('Deviance (Loss)')

# Display the plot.
plotlib.show()


Output is shown below:
X:
[[1.50234e+01 0.00000e+00 1.81000e+01 ... 2.02000e+01 3.49480e+02
  2.49100e+01]
 [5.44114e+00 0.00000e+00 1.81000e+01 ... 2.02000e+01 3.55290e+02
  1.77300e+01]
 [1.00245e+00 0.00000e+00 8.14000e+00 ... 2.10000e+01 3.80230e+02
  1.19800e+01]
 ...
 [7.89600e-02 0.00000e+00 1.28300e+01 ... 1.87000e+01 3.94920e+02
  6.78000e+00]
 [7.02200e-02 0.00000e+00 4.05000e+00 ... 1.66000e+01 3.93230e+02
  1.01100e+01]
 [3.30600e-02 0.00000e+00 5.19000e+00 ... 2.02000e+01 3.96140e+02
  8.51000e+00]]
y:
[12.  15.2 21.  24.  19.4 22.2 23.3 15.6 20.8 13.8 19.6 27.1 36.5 15.2
 11.7 14.1 17.2 16.8 32.9 21.4 32.4 23.5 20.4 13.1 12.6 10.4 50.  23.1
 13.4 24.3 25.   7.4  7.  22.  15.3  8.4 16.4 18.1 43.8  8.5 18.6 21.1
 50.  11.8 17.4 33.3 14.8  8.8 26.6 16.8 30.1 23.7 50.  19.5 16.1 24.1
 20.4 36.4 41.3 21.7 21.7 14.  21.7 20.4 20.  34.7 24.5 11.7 14.3 13.1
 17.4 20.1 19.5 21.  30.1 18.4 34.6 20.1 43.5 21.6 18.3 21.4 18.9 13.4
 30.8 25.  25.2  8.8 31.1 13.4 48.3 17.8  5.6 12.7 16.1 20.9 19.9 13.9
 22.6 21.2 21.2 22.9 20.5 22.8 19.4 21.7 23.1 26.5 18.5 20.2 27.5 50.
 21.9 23.4 32.7 14.9 15.6 20.3 11.9 30.5 31.6 21.9 25.  23.  17.   7.2
 44.8 16.  38.7 20.4 22.5 21.7 12.7  5.  21.4 23.7 21.  19.5 20.1 24.6
 36.1 23.  18.5 32.5 19.1 23.3 18.5 21.5 19.3 26.4 31.  22.3 13.3  7.
 22.5 27.5 30.1 10.2 20.  25.  17.8 13.8 32.  23.7 23.8 16.7 23.8 18.8
 22.  29.  21.2 33.1 30.7 24.8 21.7 37.6 23.1 22.9 13.6 14.6 18.9 22.6
 31.7 19.4 12.7 20.1 30.3 18.8  8.1 20.6 33.2 21.1 31.5 20.   8.5 22.2
 24.8 50.  21.7 24.  15.  13.5  9.7 18.9 22.6 29.6 20.6 24.3 16.2 19.6
 35.1 17.5 12.5 22.2 22.9 34.9 28.  17.4  7.2 23.4 21.2 27.9 20.2 22.7
 26.2 50.  32.  20.7 15.  17.2 23.1 10.9 21.5 17.2 35.2 10.9 23.8 17.8
 25.  10.5 26.6 11.9 20.3 21.9 14.5 10.8 23.1 25.  14.9  6.3 24.2 13.2
 24.7 19.8 18.5 23.9 29.6 18.7 29.1 10.5 32.2 50.  35.4  7.5 16.3 25.
 25.3 19.1 28.7 14.3 23.1 19.8 17.5 20.   8.3 23.2 26.7 17.8 19.3 18.
 10.2 22.2 28.4 21.2 11.  34.9 36.2 19.7 22.5 18.7 29.  13.5 22.4 18.4
 36.2 28.6 14.1 33.  50.  19.1 24.7 24.5 19.  23.3 22.9 28.2 24.1 26.4
 50.  48.5 11.3 29.4 10.2 13.6 13.  24.4 15.6  9.6 22.3 19.9 46.7 19.2
 20.7 24.4  5.  22.8 19.1 29.8 13.8 18.2 46.  18.3 29.8 14.2 21.4 19.6
 19.3 20.  24.8 37.9 24.8 24.6 22.6 16.1 10.4 14.1 23.9 50.  25.  19.6
 18.6 16.5 33.4 19.4 20.6 15.4 20.5 22.4 28.7 20.5 18.2 19.3 24.4 22.
 13.8 14.5 50.  41.7 22.  20.8 12.3 42.8 23.6 23.9 23.  14.4 22.8 50.
 16.6 19.9 20.1 24.7 22.1 12.1 42.3 17.1 24.4 29.9 17.1 22.  20.6 35.4
 33.4 19.  34.9 15.1 22.  33.8  8.7 27.9 33.2 37.3  7.2 19.7 31.6 50.
 12.8 22.7 23.3 13.3 20.3 24.5 19.6 16.6 11.8 50.  13.9 20.8 19.5 33.1
 14.4 19.3 16.2 13.1 23.9 19.2 20.6 21.8 20.3 23.6 28.7 26.6 44.  43.1
 14.6 27.5 16.7 37.  19.8 29.1 27.5 23.2 13.3 50.  50.  16.5 23.7 14.9
 48.8 17.3 23.2 22.2  9.5 18.7 20.9 15.6 28.4 28.1 31.2 13.1 37.2 22.
 11.5 13.8 39.8 28.5 15.2 23.8 19.4 27.1 18.9 17.9 45.4 15.6 21.6 21.4
 19.9 17.8 23.  15.4  8.3 27.  36.  22.8 17.1 22.6 23.9 17.7 31.5  8.4
 14.5 13.4 15.7 17.5 15.  21.8 18.4 25.1 19.4 17.6 18.2 24.3 23.1 24.1
 23.2 20.6]
offset:
202
X_train:
[[1.50234e+01 0.00000e+00 1.81000e+01 ... 2.02000e+01 3.49480e+02
  2.49100e+01]
 [5.44114e+00 0.00000e+00 1.81000e+01 ... 2.02000e+01 3.55290e+02
  1.77300e+01]
 [1.00245e+00 0.00000e+00 8.14000e+00 ... 2.10000e+01 3.80230e+02
  1.19800e+01]
 ...
 [3.30450e-01 0.00000e+00 6.20000e+00 ... 1.74000e+01 3.76750e+02
  1.08800e+01]
 [1.96091e+01 0.00000e+00 1.81000e+01 ... 2.02000e+01 3.96900e+02
  1.34400e+01]
 [1.61282e+00 0.00000e+00 8.14000e+00 ... 2.10000e+01 2.48310e+02
  2.03400e+01]]
X_test:
[[1.15779e+01 0.00000e+00 1.81000e+01 ... 2.02000e+01 3.96900e+02
  2.56800e+01]
 [1.70040e-01 1.25000e+01 7.87000e+00 ... 1.52000e+01 3.86710e+02
  1.71000e+01]
 [4.26131e+00 0.00000e+00 1.81000e+01 ... 2.02000e+01 3.90740e+02
  1.26700e+01]
 ...
 [7.89600e-02 0.00000e+00 1.28300e+01 ... 1.87000e+01 3.94920e+02
  6.78000e+00]
 [7.02200e-02 0.00000e+00 4.05000e+00 ... 1.66000e+01 3.93230e+02
  1.01100e+01]
 [3.30600e-02 0.00000e+00 5.19000e+00 ... 2.02000e+01 3.96140e+02
  8.51000e+00]]
X_prediction mean_squared_error:
19.748551548039462

Gradient Boosting

Python Example

References