In this post I provide some quick code snippets demonstrating how to use my C3 library to perform low-rank regression of scattered data. The approach is based on our recently accepted paper:

In that paper, we demonstrate that this approach is valid on a wide number of data sets and compare it to many machine learning algorithms. The results provide an initial indication that low-rank formats have potential to be a powerful general-purpose machine learning modeling format. In later posts, I will go through more detailed discussion about the exact type of low-rank model used and the details of its implementation. The example in this post is intended to get someone started quickly and to demonstrate the parallel to other machine learning techniques.

I assume that you have installed the library with the python interface as described in the README. First lets load numpy, matplotlib, and scikit-learn. We will compare our approach on the “diabetes” data set from scikit-learn.

1: """ Author: Alex Gorodetsky -- goroda@umich.edu """ 2: """ License: BSD 3 clause """ 3: 4: import numpy as np 5: import matplotlib.pyplot as plt 6: from sklearn import linear_model, neural_network 7: from sklearn import datasets 8: from sklearn.metrics import mean_squared_error, r2_score

Next we import `c3py`

and setup a function that calls the library to fit the data

9: import c3py 10: 11: def fit_ft(xdata, ydata, poly_order=3, initial_rank=1): 12: """ Setup and Solve a regression problem in low-rank format """ 13: ndata, dim = xdata.shape 14: 15: #setup lower and upper bounds of the approximation 16: lbounds = np.min(xdata, axis=0) - 0.5 17: ubounds = np.max(xdata, axis=0) + 0.5 18: 19: # Intialize the parameters of a low-rank model 20: # Each dimension is parameterized by polynomials 21: # with order polyorder, thus they contain poly_order+1 unknowns 22: ftrain = c3py.FunctionTrain(dim) 23: for dim_index, lb, ub in zip(range(dim), lbounds, ubounds): 24: ftrain.set_dim_opts(dim_index, "legendre", lb, ub, poly_order+1) 25: 26: # setup the initial rank of the approximation 27: ranks = [1] + [initial_rank] * (dim-1) + [1] 28: ftrain.set_ranks(ranks) 29: 30: # Build the model while adapting the rank 31: ftrain.build_data_model(ndata, xdata, ydata, 32: adaptrank=1, kickrank=1, # adapt params 33: obj="LS_SPARSECORE", regweight=1e0) #sparsity regularization 34: return ftrain 35:

Next we load the diabetes data set and split it into training and testing data. This dataset has a total of 422 instances, each described by 10 features. We take the last 20 instances for testing and train on the rest. This is the same setting as provided in this example.

36: diabetes = datasets.load_diabetes() 37: X = diabetes.data 38: Y = diabetes.target 39: 40: ndata, nfeatures = X.shape 41: ntrain = int(ndata/2) 42: ntrain = ndata - 20 43: ntest = ndata - ntrain 44: 45: xtrain = X[:ntrain,:] 46: xtest = X[ntrain:,:] 47: ytrain = Y[:ntrain] 48: ytest = Y[ntrain:] 49:

We obtain three baseline models for comparing our method with scikit-learn: a baseline guess that is simply the mean of the training data, a linear regression model, and a neural network.

50: # Baseline guess 51: y_pred_baseline = np.mean(ytrain) * np.ones(ytest.shape) 52: mse_baseline = mean_squared_error(y_pred_baseline, ytest) 53: print("Mean Squared Error of the baseline fit is {0:5f}".format(mse_baseline)) 54: 55: # Linear regression 56: linmod = linear_model.LinearRegression() # create the object 57: linmod.fit(xtrain, ytrain) # train 58: y_pred_lin = linmod.predict(xtest) #test 59: mse_lin = mean_squared_error(y_pred_lin, ytest) 60: print("Mean Squared Error of the linear fit is {0:5f}".format(mse_lin)) 61: 62: # Neural Network 63: neural = neural_network.MLPRegressor(activation='relu',solver='lbfgs') #bfgs because small data 64: neural.fit(xtrain, ytrain) 65: y_pred_nn = neural.predict(xtest) 66: mse_nn = mean_squared_error(y_pred_nn, ytest) 67: print("Mean Squared Error of the neural network fit is {0:5f}".format(mse_nn)) 68:

Finally we train and evaluate the FT

69: # Low-rank regression 70: ft = fit_ft(xtrain, ytrain) 71: y_pred_ft = ft.eval(xtest) 72: mse_ft = mean_squared_error(y_pred_ft, ytest) 73: print("Mean Squared Error of the low-rank fit is {0:5f}".format(mse_ft))

The output form this code is

Mean Squared Error of the baseline fit is 5568.964625

Mean Squared Error of the linear fit is 2004.567603

Mean Squared Error of the neural network fit is 1806.851797

Mean Squared Error of the low-rank fit is 1814.017628

The neural network results vary from run to run, most likely due to initialization randomness. The linear, neural network, and low-rank models all provide approximately the same quality of fit (recall there are only 20 testing samples!).