Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add
as_sklearn
andfrom_sklearn
APIs to serialize to CPU sklearn-estimators for supported models #6102base: branch-25.02
Are you sure you want to change the base?
Add
as_sklearn
andfrom_sklearn
APIs to serialize to CPU sklearn-estimators for supported models #6102Changes from 8 commits
8473259
c615b55
5b2a3b1
1616013
de96508
cc53477
a34c046
1ad8eff
f925864
228097c
12b2e3e
8058856
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
click
not take care of invalid values being passed :(There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, I added it by accident based on habits :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the comments? They seem like they repeat what the code says. I like comments that explain why the code is the way it is, but I don't think we need that here as it is pretty straightforward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not return a deep copy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my education, why would we want to deepcopy? Mostly asking because in my experience 99% of cases where someone uses
deepcopy
there is something else that we can do instead or just not do it. Mostly Python "just works" without deepcopy'ing, hence my interestThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we simply return a reference to the internal model, any modification to one (additional training or something else) would affect the other. This might create a situation in which the CPU and GPU attributes are out of sync in the cuML estimator. Or inversely, the sklearn estimator returned by the function might silently be updated by the cuML estimator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps it should be a parameter, by default most users probably won't care about needing a deep copy, so I wouldn't do it by default, but if a user needs it then they can request it, what do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to have a global conversion table, so that we don't need to provide the class as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a class method, so we get the class from that, it's not something the user passes (like
self
in non class methods)A global conversion table will be useful for a follow up to add
cuml.from_sklearn
library type of functionality thoughThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be interesting to add an optional parameter to this function to allow a deepcopy of the sklearn model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol I asked the same thing before reading this suggestion :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this instead of using
42
in the tests directly?We could have a global version of this that allows us to run the tests with several seeds, but maybe something to tackle in the future/new PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need it at all