Decision Trees
Decision Trees are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
Decision Trees are used in Decision Tree Learning to create predictive models.
Decision trees are made up of:
- Decision Nodes - usually shown as squares 
- Change Nodes - usually shown as circles 
- End Nodes - usually shown as triangles 
Each node of a decision tree represents a test on a data attribute. The result of the test results in logic flow down a logic branch to another node or a final result.
Tree Optimization
Trees can be optimized using techniques such as:
- Boosted Trees - building an ensemble of trees, such as AdaBoost where a combined output weighted sum is used 
- Branch Pruning - removing branches that have weak or no contribution to the predictive power of the tree 
- Multiple Trees - such as in the Random Forest algorithm 
Python Example
To download the code below, click here.
""" decision_tree_classification_with_scikit_learn.py creates, trains, and prediction tests a decision tree classifier """ # Import libraries. from sklearn.datasets import load_iris from sklearn import tree import graphviz # Load iris measurements test data. X, y = load_iris(return_X_y=True) # Create a decision tree classifier classifier = tree.DecisionTreeClassifier() # Train the classifier with the test data. classifier = classifier.fit(X, y) # Export the classifier structure. dot_data = tree.export_graphviz(classifier, out_file=None) # Graph the structure. graph = graphviz.Source(dot_data) # Output the graph into a pdf file. graph.render("iris") # Create an array of values for a prediction input. X_new = X - .1 # Predict results based on the X training data and modified X_new data. classifier.predict(X) classifier.predict(X_new)
Outputs are shown below:
Graph:- The gini coefficient measures the inequality among values of a frequency distribution. It is used to measure the cost function value to evaluate splits between branches of a tree. 
- A decision tree training algorithm recursively partitions the space until the maximum allowable depth is reached such that the samples with the same labels are grouped together. 
Prediction results using X:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
Prediction results using X_new (notice that some of the values have changed):
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2,
       2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1])
 
            