how to use ridgeParameters for ridge_regression_training()? #620
bobjiang82
started this conversation in
General
Replies: 2 comments 2 replies
-
Hi @bobjiang82 |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @bobjiang82 import daal4py as d4p
import numpy as np
# let's try to use pandas' fast csv reader
try:
import pandas
def read_csv(f, c, t=np.float64):
return pandas.read_csv(f, usecols=c, delimiter=',', header=None, dtype=t)
except ImportError:
# fall back to numpy loadtxt
def read_csv(f, c, t=np.float64):
return np.loadtxt(f, usecols=c, delimiter=',', ndmin=2)
from sklearn.linear_model import Ridge
from numpy.testing import assert_array_almost_equal
def main(readcsv=read_csv, method='defaultDense'):
infile = "./data/batch/linear_regression_train.csv"
testfile = "./data/batch/linear_regression_test.csv"
# Read data. Let's have 10 independent,
# and 2 dependent variables (for each observation)
indep_data = readcsv(infile, range(10))
dep_data = readcsv(infile, range(10, 12))
# read test data (with same #features)
pdata = readcsv(testfile, range(10))
ptdata = readcsv(testfile, range(10, 12))
for alpha in [0.1, 0.5, 1, 50, 100]:
for intercept in [False, True]:
ridge_params = np.asarray([alpha]*2, dtype=np.float64)
ridge_params = ridge_params.reshape((1, -1))
# Configure a Ridge regression training object
train_algo = d4p.ridge_regression_training(ridgeParameters=ridge_params, interceptFlag=intercept)
# Now train/compute, the result provides the model for prediction
train_result = train_algo.compute(indep_data, dep_data)
# Sklearn model
res_sk = Ridge(alpha=alpha, fit_intercept=intercept).fit(indep_data, dep_data).predict(pdata)
# Now let's do some prediction
predict_algo = d4p.ridge_regression_prediction()
# now predict using the model from the training above
predict_result = predict_algo.compute(pdata, train_result.model)
# The prediction result provides prediction
assert predict_result.prediction.shape == (pdata.shape[0], dep_data.shape[1])
assert_array_almost_equal(predict_result.prediction, res_sk, decimal=11)
if __name__ == "__main__":
main() Сode is executed successfully, without any assertions. This also applies to the case without multioutput (one column). |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I run examples/ridge_regression_batch.py after adding ridgeParameters as below,
ridge_params = np.asarray([100.0]*2, dtype=np.double)
ridge_params = ridge_params.reshape((1, -1))
train_algo = d4p.ridge_regression_training(ridgeParameters=ridge_params, interceptFlag=True)
No matter how I change the ridge params, the coefficients of the trained models are almost the same. e.g. I tried from [0.1]*2, [1]*2 to [100.0]*2.
And the coefficients are the same as what is trained by ols linear regression (linear_regression_batch.py).
I think the coefficients should become smaller obviously when ridge params increase.
With Spark MLlib, the ols linear regression trained model's coefficients are quite different from ridge regression. And the ridge params in the MLlib example for ridge regression is 0.3 (comparing to 1 as daal4py's default).
So, is it a usage issue or a bug?
Beta Was this translation helpful? Give feedback.
All reactions