Regression Analysis

Regression Analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables.

The term regression was originally applied to describe biological processes which ‘regress’ toward normative values. Today, we can think of data points regressing toward a mathematically derived normative line or shape.

For example, in the diagram below, the linear regression line represents a dependent variable y based on the independent variables x.

A Generalized Equation

A very generalized regression analysis equation is:

Dependent Variable

The dependent variable is the result of the regression analysis. It can be in the form of data points, lines, curves. In the diagram above, it’s represented by the blue line.

Function

The function is used to generate the result. In the diagram above, it is a linear function generating a line. Examples of regression analysis functions are shown below. See Linear Regression for a more detailed function example.

Independent Variable(s)

The independent variables are the data from which the dependent variable is derived. In the diagram above, these are represented by the red dots.

Unknown Parameters

Unknown parameters of the function are set during model algorithm training to produce the optimal dependent variable result.

Error

Because the dependent variable is only an estimation of the independent variables, an error value is a result of function processing. In the diagram above, the error results from the y axis distance between the red dots and blue line.

Examples of Regression Analysis Functions

Examples of types of regression analysis include:

Regression vs. Classification

Regression produces continuous number predictions. Classification produces discrete category predictions.

References