-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scarliles/splitter injection #61
base: submodulev3
Are you sure you want to change the base?
Scarliles/splitter injection #61
Conversation
❌ Linting issuesThis PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling You can see the details of the linting issues under the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few initial thoughts:
-
If this has negligible performance issues compared to scikit-learn fork and scikit-learn main (then I assume it would also for scikit-tree), then very cool and I think this is the right way to go. So my immediate thought is compare the runtime performance vs n_samples and n_dimensions for 100, 500 and 1000 trees. This is the probably the most important asap to determine if this is a suitable route. excited to chat about that.
-
If there are no performance issues, then the next question I have is how do we make this as usable as possible? We would like to ideally do some checks upon instantiation of the Splitter class, so that way it determines if any kind of SplitConditionTuple is valid and will not result in a seg fault/etc. Rn, it seems very easy to do so potentially, so we can construct some guardrails perhaps? Idk how to best do this atm though.
To be explicit, there are two sus moves happening in this design pattern:
These are what I would characterize as necessary evils resulting from technical constraints of cython -- the conditions as executed in say So my intent with the wrapper classes was that those (function, parameter struct) tuple structs never be created by hand; when you want to add a new type of condition, you write the condition function, the function-specific parameter payload struct, and the extension type wrapper that provides a usable python interface, and the condition functions and parameter structs are never called by hand. Once the wrapper class is there it can be used dynamically in any python context without having to disturb existing cython code. To mitigate the aforementioned shortcomings only two things immediately come to my mind:
I am definitely open to other suggestions. |
Say we implement the template in C++. Would it be a lot you think? I'm not opposed to supporting c++ code as long as it's short and enables stuff we otw can't do easily with Cython alone. If uncertain, then I'm okay keeping the current design for now. I'd rather get something working and benchmarked first. |
My personal opinion: we're using a duck-typed language to wrap the mother of all unsafe languages. It's the programming equivalent of duct-taping a rocket launcher to a table saw. I think it's fair to do the two mitigations I proposed (and any other simple ones that come to mind) and accept that future devs have some responsibility to know what they're doing. |
Fair point. Let's see how the benchmarks turn out then with the current approach! |
On additional note: It would also be good to confirm this parallelizes fine: #61 (review) E.g. n_jobs = 1 vs >1 |
… memory utilization in asv
…ding benefit, reverting
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Implements ability to inject split accept/reject conditions into Splitter from python.
Any other comments?
Function signatures and code organization are expected to undergo further refactoring.