Nearest Neighbors

Nearest Neighbors algorithms use training input feature and class data to define class regions that are likely to predict where new, untrained feature data classes will lie in the composite trained class regions.

Nearest Neighbors is a supervised learning algorithm.

In the diagram below generated by the python code example nearest neighbors model, the background light red, yellow, and blue shapes represent the three trained predictive regions. The more darkly colored red, yellow, and blue dots represent the data used to train the model. As can be observed in the diagram, most of the colored dots are within the boundaries of the similarly colored regions. There are a small number of yellow and blue exceptions.

Screen Shot 2020-07-27 at 1.38.20 PM.png

Python Example

To download the code below click here.

"""
nearest_neighbors_classifier.py
trains and tests a nearest neighbors classifier ...
the plot output shows the class predictions in background colors
and the input data as dots
"""

# Import needed libraries.
import numpy as np
import matplotlib.pyplot as plotlib
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets

# Define parameters.
n_neighbors = 3
weights = 'uniform'
step_size = .02
number_of_features = 2
number_of_classes = 3
plot_graph_margin = 1
feature_1_index = 0
feature_2_index = 1
class_1_background_color = 'violet'
class_2_background_color = 'moccasin'
class_3_background_color = 'skyblue'
class_1_data_point_color = 'deeppink'
class_2_data_point_color = 'orange'
class_3_data_point_color = 'dodgerblue'
plot_edge_color = 'k'
data_point_plot_size = 30
plot_title = "K-Nearest Neighbors Plot for 3 Classes"

# Import test data.
iris = datasets.load_iris()

# Create an array of feature data.
feature_data = iris.data[:, :number_of_features]
print("feature data:")
print(feature_data)

# Create an array of target classes.
# There are 3 different classes in the iris data).
target_classes = iris.target
print("target classes:")
print(target_classes)

# Assign minimum and maximum values for the plot.
plot_x_min = feature_data[:, feature_1_index].min() - plot_graph_margin
plot_x_max = feature_data[:, feature_1_index].max() + plot_graph_margin
plot_y_min = feature_data[:, feature_2_index].min() - plot_graph_margin
plot_y_max = feature_data[:, feature_2_index].max() + plot_graph_margin

# Get grid values for plotting classification prediction regions.
grid_x_values, grid_y_values = np.meshgrid(np.arange(
    plot_x_min, plot_x_max, step_size),
    np.arange(plot_y_min, plot_y_max, step_size))

# Flatten the arrays.
grid_x_flattened = grid_x_values.ravel()
grid_y_flattened = grid_y_values.ravel()

# Concatenate values to create classification prediction inputs.
prediction_input_values = np.c_[grid_x_flattened, grid_y_flattened]

# Instantiate a k-nearest neighbors model.
model = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)

# Train the model.
model.fit(feature_data, target_classes)

# Predict the classification for the data points.
predicted_values = model.predict(prediction_input_values)

# Create plot color maps.
class_background_colors = ListedColormap([class_1_background_color,
                                          class_2_background_color,
                                          class_3_background_color])
data_point_colors = ListedColormap([class_1_data_point_color,
                                    class_2_data_point_color,
                                    class_3_data_point_color])

# Plot the predicted class background color shapes.
predicted_values = predicted_values.reshape(grid_x_values.shape)
plotlib.figure()
plotlib.pcolormesh(grid_x_values,
                   grid_y_values,
                   predicted_values,
                   cmap=class_background_colors)

# Plot the input data points.
plotlib.scatter(feature_data[:, 0],
                feature_data[:, number_of_features - 1],
                c=target_classes,
                cmap=data_point_colors,
                edgecolor=plot_edge_color,
                s=data_point_plot_size)
plotlib.xlim(grid_x_values.min(), grid_x_values.max())
plotlib.ylim(grid_y_values.min(), grid_y_values.max())
plotlib.title(plot_title)

# Display the plot.
plotlib.show()

Output is below”

feature data:
[[5.1 3.5]
 [4.9 3. ]
 [4.7 3.2]
 [4.6 3.1]
 [5.  3.6]
 [5.4 3.9]
 [4.6 3.4]
  . . .
 [6.8 3.2]
 [6.7 3.3]
 [6.7 3. ]
 [6.3 2.5]
 [6.5 3. ]
 [6.2 3.4]
 [5.9 3. ]]
target classes:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]

Nearest Neighbors

Python Example

References