Cross Decomposition

Cross decomposition algorithms are useful for finding relationships between two multivariate datasets. Examples are:

PLS is used to find the fundamental relations between two matrices (X and Y). PLS regression is particularly suited when the matrix of predictors has more variables than observations, and when there is multicollinearity among X values. By contrast, standard regression will fail in these cases (unless it is regularized).

In the Partial Least Squares example below, various levels of correlation between two datasets (train, test) is illustrated:

Source: Scikit Learn

Mathematical Model

The decompositions of X and Y are made so as to maximize the covariance between T and U.

Python Example

To download the code below click here.

"""
partial_least_squares_with_scikit_learn.py
correlates and makes predictions on data in different dimension spaces
"""

# Import the scikit learn PLS module.
from sklearn.cross_decomposition import PLSRegression

# Define X and Y data.
X = [[0., 0., 1.], [1., 0., 0.], [2., 2., 2.], [2., 5., 4.]]
Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]

# Instantiate a PLSRegression model.
pls2 = PLSRegression(n_components=2)

# Fit the model to the data.
pls2.fit(X, Y)

# Make a prediction based on new input data.
X_new = [[2., 1., 1.], [2., 1., 0.], [5., 3., 2.], [1., 4., 2.]]
Y_pred = pls2.predict(X_new)

# Display the result.
print(Y_pred)

Results are shown below:

[[ 4.3062782   4.24098373]
 [ 3.11475114  3.00799304]
 [12.05128289 12.11358741]
 [ 6.87275405  6.98368783]]

References