Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GridSearchCV does not return training score #66

Open
CBongiova opened this issue Feb 12, 2020 · 6 comments
Open

GridSearchCV does not return training score #66

CBongiova opened this issue Feb 12, 2020 · 6 comments

Comments

@CBongiova
Copy link

Hi,

I am trying to return the training score through GridSearchCV. Having a look at the ScikitLearn documentation I saw I should be normally be able to pass an input "return_train_score=true".
However, when I try it in Julia, I get a method error.

grid_search = GridSearchCV(clf, param_grid,return_train_score=true)

Does anyone know how to retrieve train scores correctly?
Thanks!

@cstjean
Copy link
Owner

cstjean commented Feb 12, 2020

Could you please post the error message? Some minimal code that demonstrates the issue would be appreciated as well!

@CBongiova
Copy link
Author

@cstjean Thanks for replying

The code is actually pretty straight-forward (I took an example from the scikitlearn website):

`############# Grid search
clf=RandomForestClassifier(n_estimators=100,class_weight="balanced_subsample",criterion="entropy",max_depth=30)

Utility function to report best scores

function report(grid_scores, n_top=3)
top_scores = sort(grid_scores, by=x->x.mean_validation_score, rev=true)[1:n_top]
for (i, score) in enumerate(top_scores)
println("Model with rank:$i")
@printf("Mean validation score: %.3f (std: %.3f)\n",
score.mean_validation_score,
std(score.cv_validation_scores))
println("Parameters: $(score.parameters)")
println("")
end
end

use a full grid over all parameters

param_grid = Dict("max_features"=> [1, 6, 12],
"min_samples_leaf"=> [1, 5, 10],
"min_impurity_decrease" => [0,0.1,0.3],
"min_samples_split"=> [2, 5, 10]
)

run grid search

grid_search = GridSearchCV(clf, param_grid,return_train_score=true)

start = @Elapsed begin
fit!(grid_search, features_new, labels_new)
end
println("GridSearchCV took $start seconds")

report(grid_search.grid_scores_)`

I guess you can use the dataset from https://scikitlearnjl.readthedocs.io/en/latest/quickstart/ for testing.

The error message is :

MethodError: no method matching GridSearchCV(; estimator=PyObject RandomForestClassifier(bootstrap=True, class_weight='balanced_subsample',

                   criterion='entropy', max_depth=30, max_features='auto',
                   max_leaf_nodes=None, min_impurity_decrease=0.0,
                   min_impurity_split=None, min_samples_leaf=1,
                   min_samples_split=2, min_weight_fraction_leaf=0.0,
                   n_estimators=100, n_jobs=None, oob_score=False,
                   random_state=None, verbose=0, warm_start=False), param_grid=Dict{String,Array{T,1} where T}("min_samples_split" => [2, 5, 10],"min_impurity_decrease" => [0.0, 0.1, 0.3],"min_samples_leaf" => [1, 5, 10],"max_features" => [1, 6, 12]), return_train_score=true)

Closest candidates are:
GridSearchCV(; estimator, param_grid, scoring, loss_func, score_func, fit_params, n_jobs, iid, refit, cv, verbose, error_score, scorer_, best_params_, best_score_, grid_scores_, best_estimator_) at /Users/admin/.juliapro/JuliaPro_v1.2.0-1/packages/Parameters/l76EM/src/Parameters.jl:466 got unsupported keyword argument "return_train_score"
GridSearchCV(!Matched::GridSearchCV; kws...) at /Users/admin/.juliapro/JuliaPro_v1.2.0-1/packages/Parameters/l76EM/src/Parameters.jl:528
GridSearchCV(!Matched::GridSearchCV, !Matched::AbstractDict) at /Users/admin/.juliapro/JuliaPro_v1.2.0-1/packages/Parameters/l76EM/src/Parameters.jl:531 got unsupported keyword arguments "estimator", "param_grid", "return_train_score"
...
kwerr(::NamedTuple{(:estimator, :param_grid, :return_train_score),Tuple{PyObject,Dict{String,Array{T,1} where T},Bool}}, ::Type) at error.jl:125
(::getfield(Core, Symbol("#kw#Type")))(::NamedTuple{(:estimator, :param_grid, :return_train_score),Tuple{PyObject,Dict{String,Array{T,1} where T},Bool}}, ::Type{GridSearchCV}) at none:0
#GridSearchCV#110(::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:return_train_score,),Tuple{Bool}}}, ::Type{GridSearchCV}, ::PyObject, ::Dict{String,Array{T,1} where T}) at grid_search.jl:545
(::getfield(Core, Symbol("#kw#Type")))(::NamedTuple{(:return_train_score,),Tuple{Bool}}, ::Type{GridSearchCV}, ::PyObject, ::Dict{String,Array{T,1} where T}) at none:0
top-level scope at Train_ML.jl:480
include_string(::Module, ::String, ::String) at sys.dylib:?
include_string(::Module, ::String, ::String, ::Int64) at eval.jl:30
(::getfield(Atom, Symbol("##127#132")){String,Int64,String,Bool})() at eval.jl:94
withpath(::getfield(Atom, Symbol("##127#132")){String,Int64,String,Bool}, ::String) at utils.jl:30
withpath at eval.jl:47 [inlined]
#126 at eval.jl:93 [inlined]
with_logstate(::getfield(Atom, Symbol("##126#131")){String,Int64,String,Bool}, ::Base.CoreLogging.LogState) at logging.jl:395
with_logger at logging.jl:491 [inlined]
#125 at eval.jl:92 [inlined]
hideprompt(::getfield(Atom, Symbol("##125#130")){String,Int64,String,Bool}) at repl.jl:85
macro expansion at eval.jl:91 [inlined]
macro expansion at dynamic.jl:24 [inlined]
(::getfield(Atom, Symbol("##124#129")))(::Dict{String,Any}) at eval.jl:86
handlemsg(::Dict{String,Any}, ::Dict{String,Any}) at comm.jl:164
(::getfield(Atom, Symbol("##19#21")){Array{Any,1}})() at task.jl:268

@cstjean
Copy link
Owner

cstjean commented Feb 18, 2020

Thank you for the bug report. My best guess is that return_train_score is a "new" parameter. ScikitLearn unfortunately lags behind scikit-learn python by a few years. I don't have time to look into it at the moment, but if you would like to make a pull request implementing it, it will be appreciated!

@Rohp001
Copy link

Rohp001 commented Jul 8, 2020

Probably you should try GridSearchCV(return_train_score=True), with the T capital. Since, it takes an boolean input so, that is why the small case letter 't' is not working.

@CBongiova
Copy link
Author

@Rohp001 I don't think this is the issue, the capital "T" is for python syntax. Julia's syntax uses lower-case "t" for boolean true.

@Rohp001
Copy link

Rohp001 commented Jul 9, 2020

Oh, sorry, I didn't saw the language you were using. My bad!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants