Programming Process

The Machine Learning programming process is more than just the development of program code. Machine Learning as a whole is very exploratory and experimental. Scientific methods play an important role.

Programming and Scientific Methods

The programming process can be understood in the context of scientific methods, the steps of which include:

  1. Question: a condition, problem or opportunity to be explored and answered

  2. Hypothesis: a potential answer to the question

  3. Prediction: a predicted result of the hypothesis

  4. Testing: experiments to determine whether results of the solution meet predictions

  5. Analysis: an examination of testing results

In the context of scientific methods, programming often includes aspects such as:

  1. Question: Definition, Constraints, Inputs Given, Desired Outputs

  2. Hypothesis: Data Structure, Basic Approach, Processing Diagram, Processing Path

  3. Prediction: Measurement of Testing Results

  4. Testing: Code Outline, Code Development, Code Testing

  5. Analysis: Code Verification, Code Refinement

Programming processes range from those for large application systems to small functions.

Programming Code Example

In the sections below, the example is to efficiently find a specific value in a sorted array.

Question

A question can come in a number of forms, including:

  • Explicit and formal requirements.

  • A general statement that requires clarification.

Definition

  • Can a function be coded to efficiently find a specific value in a sorted array?

Constraints

  • The solution should perform at Big O (log (n)).

  • The programming language used should be Python.

Inputs

  • A sorted array of numbers.

  • The target number to find.

Outputs

  • The position in the array of the input target number.

  • -1 if the target number is not found.

Hypothesis

A hypothesis in the context of programming is a proposed solution to the question. For very simple programming tasks, the solution may be obvious. For more difficult tasks, finding the solution may require testing a number of approaches.

Data Structure

In general, data structures considered can include:

  • arrays

  • lists

  • heaps

  • queues

  • binary trees

  • binary search trees

  • linked lists

  • associative arrays, such as a dictionary in Python

  • databases

  • graphs

For our example, the data structure is given as part of the problem statement:

  • an array of numbers.

Basic Approach

The basic approach to a solution will be highly dependent on the Objective Definition. In the case of our search objective, options include:

  • sequential progression over the elements

  • interval search over the elements, such as a binary search

Sequential search has an efficiency of O(n) while a binary search is at O(log(n)). Because one of our constraints is performance at O(log(n)), and the array is sorted, we’ll use the binary search approach.

Processing Diagram

As illustrated in the diagram below, a binary search of an array is a process of successively finding the mid-point of array cells and taking action based on the value in the mid-point cell relative to the target cell value to locate:

  • equal to: return that cell location

  • greater than: process the right sub-array

  • less than: process the left sub-array

This is repeated until the target value is found or the cells to process are exhausted and the value is not found.

Main Function Type

Main function types for our example include:

Because our processing path includes repeating the same basic process of halving and testing, we’ll use recursion.

Testing

Code Outline

A code outline is a convenient way way to begin coding. It allows defining a logic sequence without concern for code syntax. As shown below, code comments can be used which will translate easily to actual code:

# Recursive binary_search function.
    # Exit the search if right position < left position - element not found.
    # Calculate mid-point of the array elements to be searched.
    # If target element is found at the mid-point, return that array position.
    # If target element < mid-point value, process the left sub-array.
        # Execute binary_search on left sub-array.
    # If target element > the mid-point value, process the right sub-array.
        # Execute binary_search on right sub-array.
# Function to execute the recursive binary_search function.
    # Set initial left and right positions.
        # Initial left position = 0.
        # Initial right position = 16.
    # Execute the binary_search function using the initial positions.

Code Development

The code below implements the basics from the code outline above:

# Import needed libraryies.
import numpy as np

# Initialize global values.
number_of_searches = 0
number_of_iterations_in_a_search = 0
number_of_iterations_total = 0


# Recursive function to perform a binary search to find the array position of a given number.
def binary_search(data_array, left_position, right_position, target_number):
    global number_of_iterations_in_a_search
    global number_of_iterations_total
    print("left position: " + str(left_position) + "  right position: " + str(right_position))
    number_of_iterations_in_a_search += 1
    number_of_iterations_total += 1

    # Exit the search if the element was not found.
    if right_position < left_position:
        return -1

    # Calculate the mid-point of the array elements to be searched.
    mid_point = left_position + ((right_position - left_position) // 2)
    print("midpoint position: " + str(mid_point) + "   midpoint value: " + str(data_array[mid_point]))

    # If the target element is found at the mid-point, return that array position.
    if data_array[mid_point] == target_number:
        return mid_point

    # If the target element is smaller than the mid-point value, process the left sub-array.
    elif data_array[mid_point] > target_number:
        print("Target number smaller than mid-point number.")
        return binary_search(data_array, left_position, mid_point - 1, target_number)

    # If the target element is larger than the mid-point value, process the right sub-array.
    else:
        print("Target number larger than mid-point number.")
        return binary_search(data_array, mid_point + 1, right_position, target_number)


# Function to perform a binary search given an input array and number to find.
def execute_binary_search(array, number_to_find):
    global number_of_searches
    global number_of_iterations_in_a_search
    print("\n" + "Test for Number: " + str(number_to_find) + " -------------------------")
    initial_left_position = 0
    initial_right_position = len(array) - 1
    number_of_searches += 1
    number_of_iterations_in_a_search = 0
    position = binary_search(array, initial_left_position, initial_right_position, number_to_find)
    print("Number of iterations in the search: " + str(number_of_iterations_in_a_search))
    if position != -1:
        print ("NUMBER FOUND AT POSITION: " + str(position))
    else:
        print ("NUMBER WAS NOT FOUND.")


# Define the input data array, which must be is a sorted order.
input_array = [1, 3, 11, 24, 33, 43, 45, 49, 54, 60, 68, 76, 82, 84, 93, 97, 99]

# Execute the binary search
execute_binary_search(input_array, 43)

Code Testing

The code below tests for accurately finding numbers in the array and edge cases of the results of attempting the location of numbers not in the array:

# Execute binary search tests for numbers in the input data array.
print("\n" + "\n" + "BINARY SEARCH TEST ============================================")
execute_binary_search(input_array, 43)
execute_binary_search(input_array, 93)
execute_binary_search(input_array, 54)

# Execute binary search tests for edge cases of numbers not in the data array.
execute_binary_search(input_array, 44)
execute_binary_search(input_array, 0)
execute_binary_search(input_array, 100)
execute_binary_search(input_array, 83)

Analysis

Code Results Verification

The code below verifies that that performance is O(log(n)):

# Calculate execution performance of the binary search, which should be O(log(n)).
size_of_input_array = len(input_array)
log_of_size_input_array = np.log2(size_of_input_array)
average_number_of_iterations_in_a_search = float(number_of_iterations_total) / float(number_of_searches)
deviation_from_log_of_n_performance = \
    average_number_of_iterations_in_a_search - log_of_size_input_array

# Print performance results.
print("\n" + "PERFORMANCE OF THE BINARY SEARCH ======================")
print("Size of the input array: " + str(size_of_input_array))
print("Log of the size of the input array: " + str(log_of_size_input_array))
print("Total number of search executions: " + str(number_of_searches))
print("Total number of search iterations: " + str(number_of_iterations_total))
print("Average number of iterations in a search: " + str(average_number_of_iterations_in_a_search))
print("Deviation of average iterations from log(n) performance: " + str(deviation_from_log_of_n_performance))

Output of the code is:

PERFORMANCE OF THE BINARY SEARCH ======================
Size of the input array: 17
Log of the size of the input array: 4.087462841250339
Total number of search executions: 7
Total number of search iterations: 28
Average number of iterations in a search: 4.0

Code Refinement

Code refinement can include:

  • Correcting coding errors.

  • Enhancing code comments.

  • Addressing additional edge cases.

Complete Code

Below is the code after applying the processes above.

To download the code, click here.

"""
binary_search.py
performs a binary search using a recursive function and evaluates the results
"""

# Import needed libraryies.
import numpy as np

# Initialize global values.
number_of_searches = 0
number_of_iterations_in_a_search = 0
number_of_iterations_total = 0


# Recursive function to perform a binary search to find the array position of a given number.
def binary_search(data_array, left_position, right_position, target_number):
    global number_of_iterations_in_a_search
    global number_of_iterations_total
    print("left position: " + str(left_position) + "  right position: " + str(right_position))
    number_of_iterations_in_a_search += 1
    number_of_iterations_total += 1

    # Exit the search if the element was not found.
    if right_position < left_position:
        return -1

    # Calculate the mid-point of the array elements to be searched.
    mid_point = left_position + ((right_position - left_position) // 2)
    print("midpoint position: " + str(mid_point) + "   midpoint value: " + str(data_array[mid_point]))

    # If the target element is found at the mid-point, return that array position.
    if data_array[mid_point] == target_number:
        return mid_point

    # If the target element is smaller than the mid-point value, process the left sub-array.
    elif data_array[mid_point] > target_number:
        print("Target number smaller than mid-point number.")
        return binary_search(data_array, left_position, mid_point - 1, target_number)

    # If the target element is larger than the mid-point value, process the right sub-array.
    else:
        print("Target number larger than mid-point number.")
        return binary_search(data_array, mid_point + 1, right_position, target_number)


# Function to perform a binary search given an input array and number to find.
def execute_binary_search(array, number_to_find):
    global number_of_searches
    global number_of_iterations_in_a_search
    print("\n" + "Test for Number: " + str(number_to_find) + " -------------------------")
    initial_left_position = 0
    initial_right_position = len(array) - 1
    number_of_searches += 1
    number_of_iterations_in_a_search = 0
    position = binary_search(array, initial_left_position, initial_right_position, number_to_find)
    print("Number of iterations in the search: " + str(number_of_iterations_in_a_search))
    if position != -1:
        print ("NUMBER FOUND AT POSITION: " + str(position))
    else:
        print ("NUMBER WAS NOT FOUND.")


# Define the input data array, which must be is a sorted order.
input_array = [1, 3, 11, 24, 33, 43, 45, 49, 54, 60, 68, 76, 82, 84, 93, 97, 99]

# Execute binary search tests for numbers in the input data array.
print("\n" + "\n" + "BINARY SEARCH TEST ============================================")
execute_binary_search(input_array, 43)
execute_binary_search(input_array, 93)
execute_binary_search(input_array, 54)

# Execute binary search tests for edge cases of numbers not in the data array.
execute_binary_search(input_array, 44)
execute_binary_search(input_array, 0)
execute_binary_search(input_array, 100)
execute_binary_search(input_array, 83)

# Calculate execution performance of the binary search, which should be O(log(n)).
size_of_input_array = len(input_array)
log_of_size_input_array = np.log2(size_of_input_array)
average_number_of_iterations_in_a_search = float(number_of_iterations_total) / float(number_of_searches)
deviation_from_log_of_n_performance = \
    average_number_of_iterations_in_a_search - log_of_size_input_array

# Print performance results.
print("\n" + "PERFORMANCE OF THE BINARY SEARCH ======================")
print("Size of the input array: " + str(size_of_input_array))
print("Log of the size of the input array: " + str(log_of_size_input_array))
print("Total number of search executions: " + str(number_of_searches))
print("Total number of search iterations: " + str(number_of_iterations_total))
print("Average number of iterations in a search: " + str(average_number_of_iterations_in_a_search))
print("Deviation of average iterations from log(n) performance: " + str(deviation_from_log_of_n_performance))
Output results are shown below:

BINARY SEARCH TEST ============================================

Test for Number: 43 -------------------------
left position: 0  right position: 16
midpoint position: 8   midpoint value: 54
Target number smaller than mid-point number.
left position: 0  right position: 7
midpoint position: 3   midpoint value: 24
Target number larger than mid-point number.
left position: 4  right position: 7
midpoint position: 5   midpoint value: 43
Number of iterations in the search: 3
NUMBER FOUND AT POSITION: 5

Test for Number: 93 -------------------------
left position: 0  right position: 16
midpoint position: 8   midpoint value: 54
Target number larger than mid-point number.
left position: 9  right position: 16
midpoint position: 12   midpoint value: 82
Target number larger than mid-point number.
left position: 13  right position: 16
midpoint position: 14   midpoint value: 93
Number of iterations in the search: 3
NUMBER FOUND AT POSITION: 14

Test for Number: 54 -------------------------
left position: 0  right position: 16
midpoint position: 8   midpoint value: 54
Number of iterations in the search: 1
NUMBER FOUND AT POSITION: 8

Test for Number: 44 -------------------------
left position: 0  right position: 16
midpoint position: 8   midpoint value: 54
Target number smaller than mid-point number.
left position: 0  right position: 7
midpoint position: 3   midpoint value: 24
Target number larger than mid-point number.
left position: 4  right position: 7
midpoint position: 5   midpoint value: 43
Target number larger than mid-point number.
left position: 6  right position: 7
midpoint position: 6   midpoint value: 45
Target number smaller than mid-point number.
left position: 6  right position: 5
Number of iterations in the search: 5
NUMBER WAS NOT FOUND.

Test for Number: 0 -------------------------
left position: 0  right position: 16
midpoint position: 8   midpoint value: 54
Target number smaller than mid-point number.
left position: 0  right position: 7
midpoint position: 3   midpoint value: 24
Target number smaller than mid-point number.
left position: 0  right position: 2
midpoint position: 1   midpoint value: 3
Target number smaller than mid-point number.
left position: 0  right position: 0
midpoint position: 0   midpoint value: 1
Target number smaller than mid-point number.
left position: 0  right position: -1
Number of iterations in the search: 5
NUMBER WAS NOT FOUND.

Test for Number: 100 -------------------------
left position: 0  right position: 16
midpoint position: 8   midpoint value: 54
Target number larger than mid-point number.
left position: 9  right position: 16
midpoint position: 12   midpoint value: 82
Target number larger than mid-point number.
left position: 13  right position: 16
midpoint position: 14   midpoint value: 93
Target number larger than mid-point number.
left position: 15  right position: 16
midpoint position: 15   midpoint value: 97
Target number larger than mid-point number.
left position: 16  right position: 16
midpoint position: 16   midpoint value: 99
Target number larger than mid-point number.
left position: 17  right position: 16
Number of iterations in the search: 6
NUMBER WAS NOT FOUND.

Test for Number: 83 -------------------------
left position: 0  right position: 16
midpoint position: 8   midpoint value: 54
Target number larger than mid-point number.
left position: 9  right position: 16
midpoint position: 12   midpoint value: 82
Target number larger than mid-point number.
left position: 13  right position: 16
midpoint position: 14   midpoint value: 93
Target number smaller than mid-point number.
left position: 13  right position: 13
midpoint position: 13   midpoint value: 84
Target number smaller than mid-point number.
left position: 13  right position: 12
Number of iterations in the search: 5
NUMBER WAS NOT FOUND.

PERFORMANCE OF THE BINARY SEARCH ======================
Size of the input array: 17
Log of the size of the input array: 4.087462841250339
Total number of search executions: 7
Total number of search iterations: 28
Average number of iterations in a search: 4.0