GridSearchCV does not return training score #66

CBongiova · 2020-02-12T17:26:34Z

Hi,

I am trying to return the training score through GridSearchCV. Having a look at the ScikitLearn documentation I saw I should be normally be able to pass an input "return_train_score=true".
However, when I try it in Julia, I get a method error.

grid_search = GridSearchCV(clf, param_grid,return_train_score=true)

Does anyone know how to retrieve train scores correctly?
Thanks!

cstjean · 2020-02-12T18:00:41Z

Could you please post the error message? Some minimal code that demonstrates the issue would be appreciated as well!

CBongiova · 2020-02-12T19:08:01Z

@cstjean Thanks for replying

The code is actually pretty straight-forward (I took an example from the scikitlearn website):

`############# Grid search
clf=RandomForestClassifier(n_estimators=100,class_weight="balanced_subsample",criterion="entropy",max_depth=30)

Utility function to report best scores

function report(grid_scores, n_top=3)
top_scores = sort(grid_scores, by=x->x.mean_validation_score, rev=true)[1:n_top]
for (i, score) in enumerate(top_scores)
println("Model with rank:$i")
@printf("Mean validation score: %.3f (std: %.3f)\n",
score.mean_validation_score,
std(score.cv_validation_scores))
println("Parameters: $(score.parameters)")
println("")
end
end

use a full grid over all parameters

param_grid = Dict("max_features"=> [1, 6, 12],
"min_samples_leaf"=> [1, 5, 10],
"min_impurity_decrease" => [0,0.1,0.3],
"min_samples_split"=> [2, 5, 10]
)

run grid search

grid_search = GridSearchCV(clf, param_grid,return_train_score=true)

start = @Elapsed begin
fit!(grid_search, features_new, labels_new)
end
println("GridSearchCV took $start seconds")

report(grid_search.grid_scores_)`

I guess you can use the dataset from https://scikitlearnjl.readthedocs.io/en/latest/quickstart/ for testing.

The error message is :

MethodError: no method matching GridSearchCV(; estimator=PyObject RandomForestClassifier(bootstrap=True, class_weight='balanced_subsample',

                   criterion='entropy', max_depth=30, max_features='auto',
                   max_leaf_nodes=None, min_impurity_decrease=0.0,
                   min_impurity_split=None, min_samples_leaf=1,
                   min_samples_split=2, min_weight_fraction_leaf=0.0,
                   n_estimators=100, n_jobs=None, oob_score=False,
                   random_state=None, verbose=0, warm_start=False), param_grid=Dict{String,Array{T,1} where T}("min_samples_split" => [2, 5, 10],"min_impurity_decrease" => [0.0, 0.1, 0.3],"min_samples_leaf" => [1, 5, 10],"max_features" => [1, 6, 12]), return_train_score=true)

Closest candidates are:
GridSearchCV(; estimator, param_grid, scoring, loss_func, score_func, fit_params, n_jobs, iid, refit, cv, verbose, error_score, scorer_, best_params_, best_score_, grid_scores_, best_estimator_) at /Users/admin/.juliapro/JuliaPro_v1.2.0-1/packages/Parameters/l76EM/src/Parameters.jl:466 got unsupported keyword argument "return_train_score"
GridSearchCV(!Matched::GridSearchCV; kws...) at /Users/admin/.juliapro/JuliaPro_v1.2.0-1/packages/Parameters/l76EM/src/Parameters.jl:528
GridSearchCV(!Matched::GridSearchCV, !Matched::AbstractDict) at /Users/admin/.juliapro/JuliaPro_v1.2.0-1/packages/Parameters/l76EM/src/Parameters.jl:531 got unsupported keyword arguments "estimator", "param_grid", "return_train_score"
...
kwerr(::NamedTuple{(:estimator, :param_grid, :return_train_score),Tuple{PyObject,Dict{String,Array{T,1} where T},Bool}}, ::Type) at error.jl:125
(::getfield(Core, Symbol("#kw#Type")))(::NamedTuple{(:estimator, :param_grid, :return_train_score),Tuple{PyObject,Dict{String,Array{T,1} where T},Bool}}, ::Type{GridSearchCV}) at none:0
#GridSearchCV#110(::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:return_train_score,),Tuple{Bool}}}, ::Type{GridSearchCV}, ::PyObject, ::Dict{String,Array{T,1} where T}) at grid_search.jl:545
(::getfield(Core, Symbol("#kw#Type")))(::NamedTuple{(:return_train_score,),Tuple{Bool}}, ::Type{GridSearchCV}, ::PyObject, ::Dict{String,Array{T,1} where T}) at none:0
top-level scope at Train_ML.jl:480
include_string(::Module, ::String, ::String) at sys.dylib:?
include_string(::Module, ::String, ::String, ::Int64) at eval.jl:30
(::getfield(Atom, Symbol("##127#132")){String,Int64,String,Bool})() at eval.jl:94
withpath(::getfield(Atom, Symbol("##127#132")){String,Int64,String,Bool}, ::String) at utils.jl:30
withpath at eval.jl:47 [inlined]
#126 at eval.jl:93 [inlined]
with_logstate(::getfield(Atom, Symbol("##126#131")){String,Int64,String,Bool}, ::Base.CoreLogging.LogState) at logging.jl:395
with_logger at logging.jl:491 [inlined]
#125 at eval.jl:92 [inlined]
hideprompt(::getfield(Atom, Symbol("##125#130")){String,Int64,String,Bool}) at repl.jl:85
macro expansion at eval.jl:91 [inlined]
macro expansion at dynamic.jl:24 [inlined]
(::getfield(Atom, Symbol("##124#129")))(::Dict{String,Any}) at eval.jl:86
handlemsg(::Dict{String,Any}, ::Dict{String,Any}) at comm.jl:164
(::getfield(Atom, Symbol("##19#21")){Array{Any,1}})() at task.jl:268

cstjean · 2020-02-18T17:31:01Z

Thank you for the bug report. My best guess is that return_train_score is a "new" parameter. ScikitLearn unfortunately lags behind scikit-learn python by a few years. I don't have time to look into it at the moment, but if you would like to make a pull request implementing it, it will be appreciated!

Rohp001 · 2020-07-08T07:08:49Z

Probably you should try GridSearchCV(return_train_score=True), with the T capital. Since, it takes an boolean input so, that is why the small case letter 't' is not working.

CBongiova · 2020-07-08T08:10:52Z

@Rohp001 I don't think this is the issue, the capital "T" is for python syntax. Julia's syntax uses lower-case "t" for boolean true.

Rohp001 · 2020-07-09T04:28:02Z

Oh, sorry, I didn't saw the language you were using. My bad!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GridSearchCV does not return training score #66

GridSearchCV does not return training score #66

CBongiova commented Feb 12, 2020

cstjean commented Feb 12, 2020

CBongiova commented Feb 12, 2020

cstjean commented Feb 18, 2020

Rohp001 commented Jul 8, 2020 •

edited

Loading

CBongiova commented Jul 8, 2020

Rohp001 commented Jul 9, 2020

GridSearchCV does not return training score #66

GridSearchCV does not return training score #66

Comments

CBongiova commented Feb 12, 2020

cstjean commented Feb 12, 2020

CBongiova commented Feb 12, 2020

Utility function to report best scores

use a full grid over all parameters

run grid search

cstjean commented Feb 18, 2020

Rohp001 commented Jul 8, 2020 • edited Loading

CBongiova commented Jul 8, 2020

Rohp001 commented Jul 9, 2020

Rohp001 commented Jul 8, 2020 •

edited

Loading