Decision Trees

Decision Trees are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

Decision Trees are used in Decision Tree Learning to create predictive models.

Decision trees are made up of:

  • Decision Nodes - usually shown as squares

  • Change Nodes - usually shown as circles

  • End Nodes - usually shown as triangles

Decision-Tree.jpg

Each node of a decision tree represents a test on a data attribute. The result of the test results in logic flow down a logic branch to another node or a final result.

Tree Optimization

Trees can be optimized using techniques such as:

  • Boosted Trees - building an ensemble of trees, such as AdaBoost where a combined output weighted sum is used

  • Branch Pruning - removing branches that have weak or no contribution to the predictive power of the tree

  • Multiple Trees - such as in the Random Forest algorithm

Python Example

To download the code below, click here.

"""
decision_tree_classification_with_scikit_learn.py
creates, trains, and prediction tests a decision tree classifier
"""

# Import libraries.
from sklearn.datasets import load_iris
from sklearn import tree
import graphviz

# Load iris measurements test data.
X, y = load_iris(return_X_y=True)

# Create a decision tree classifier
classifier = tree.DecisionTreeClassifier()

# Train the classifier with the test data.
classifier = classifier.fit(X, y)

# Export the classifier structure.
dot_data = tree.export_graphviz(classifier, out_file=None)

# Graph the structure.
graph = graphviz.Source(dot_data)

# Output the graph into a pdf file.
graph.render("iris")

# Create an array of values for a prediction input.
X_new = X - .1

# Predict results based on the X training data and modified X_new data.
classifier.predict(X)
classifier.predict(X_new)

Outputs are shown below:

Graph:
  • The gini coefficient measures the inequality among values of a frequency distribution. It is used to measure the cost function value to evaluate splits between branches of a tree.

  • A decision tree training algorithm recursively partitions the space until the maximum allowable depth is reached such that the samples with the same labels are grouped together.

Prediction results using X:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Prediction results using X_new (notice that some of the values have changed):
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2,
       2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1])

References