Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SPORF #374

Merged
merged 19 commits into from
Dec 15, 2020
Merged

Implement SPORF #374

merged 19 commits into from
Dec 15, 2020

Conversation

parthgvora
Copy link
Contributor

@parthgvora parthgvora commented Nov 20, 2020

Reference issue

Closes #351, #352.

Type of change

Feature request

What does this implement/fix?

  • Scikit-learn compliant implementation of SPORF written in pure Python.
    • SPORF decision tree found in oblique_tree.py. Implements fit and predict functions as per sklearn requirements.
    • ObliqueTreeClassificationTransformer added in transformers.py
    • Unit tests for oblique_tree.py in tests/oblique_tree_test.py
  • Tutorial notebook for the proglearn implementation of SPORF is provided here.
    • Notebook shows that proglearn SPORF produces the same results as the paper on the following UCI datasets: Hill-Valley, acute inflammation task 1, acute inflammation task 2

Additional information

  • Feature request for scikit-learn Oblique Random Forests written here.
  • Code is written with modularity in mind, to allow for easy implementation of other oblique trees. For example, implementing MORF just requires implementing a new splitter class.
  • Joint work of @parthgvora and @jmandavilli, with tasks split up according to provided issues.
  • @jdey4 @levinwil, please let us know if you would like to make any changes.

Copy link
Collaborator

@levinwil levinwil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For oblique tree, could you please make sure all of the docstrings have underscores that match the lengths of the headers. For example, line 16 should be enough underscores to cover the "Parameters" word. Also, you shouldn't have colons after those words.

This is to make sure things render in Sphinx correctly. Once you make those changes, can you please check the web docs to make sure things render appropriately? For reference, please check the other docs to see what it should look like and update until your docs looks like that.

Copy link
Collaborator

@levinwil levinwil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the travis tests failed. One of the reasons is you didn't include the oblique tree in the ProgLearn package. To do so, please add it to https://github.com/parthgvora/ProgLearn/blob/staging/proglearn/__init__.py

Please update the code until the travis tests pass. If the code does not pass a test, Travis will show you those errors if you check the Travis console.

@parthgvora
Copy link
Contributor Author

Thanks for the help Will! We'll make the changes and get back to you.

@jdey4
Copy link
Collaborator

jdey4 commented Nov 20, 2020

@parthgvora we also need all the benchmarks that show your implementation works as well as the main paper.

@parthgvora
Copy link
Contributor Author

parthgvora commented Nov 20, 2020

@parthgvora we also need all the benchmarks that show your implementation works as well as the main paper.

This should be in the notebook - sorry for any confusion. We ran the code on 3 UCI datasets that SPORF was able to classify perfectly; our code was also able to classify them perfectly. Notably, we could classify the Hill-Valley dataset perfectly, which RF was unable to do.

@jdey4
Copy link
Collaborator

jdey4 commented Nov 20, 2020

I will go through it in detail. It may take time.

@parthgvora
Copy link
Contributor Author

parthgvora commented Nov 20, 2020

I will go through it in detail. It may take time.

No problem - please let us know what you think :)

If you have any questions please let me know

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parthgvora @jmandavilli please black format your files to pass the travis test.

@PSSF23
Copy link
Member

PSSF23 commented Nov 21, 2020

@parthgvora there were some strange indentations of docstrings, and I fixed them for you. This fix should not affect black formatting.

@levinwil
Copy link
Collaborator

@PSSF23 It looks good to me now. What do you think?

@PSSF23
Copy link
Member

PSSF23 commented Nov 23, 2020

@levinwil Jayanta said he was going to look through the benchmarks. Otherwise, to my limit of knowledge, everything looks good.

@levinwil
Copy link
Collaborator

@jdey4 Have you had a chance to look this over?

@jdey4
Copy link
Collaborator

jdey4 commented Nov 24, 2020

@levinwil nope.

@levinwil
Copy link
Collaborator

@parthgvora Please remember to add all parameters to the docstrings

@parthgvora
Copy link
Contributor Author

@parthgvora Please remember to add all parameters to the docstrings

Yep, sorry! We're just going through the changes you suggested.
We'll have docstrings and the tutorial notebooks in soon, like tonight or tomorrow.

Thanks for your help!

@jmandavilli
Copy link
Contributor

I just added the parameters to the docstrings as you suggested Will, so that should be good now. Sorry for the delay.

@levinwil
Copy link
Collaborator

This looks good to me.

@jdey4 you mentioned that you wanted to go through this - have you had the chance to?

@PSSF23 do you have any comments and/or concerns? I don’t see any glaring problems.

@PSSF23
Copy link
Member

PSSF23 commented Dec 12, 2020

@levinwil should the tutorials be included in the rst file?

@levinwil
Copy link
Collaborator

@PSSF23 yup!

@PSSF23
Copy link
Member

PSSF23 commented Dec 12, 2020

@levinwil then do you think the tutorial names should be more consistent? Like sporf_datasets & sporf_decision_boundaries or oblique_datasets & oblique_decision_boundaries?

@PSSF23
Copy link
Member

PSSF23 commented Dec 12, 2020

@parthgvora @jmandavilli I added your tutorials to the rst file and resolved conflicts due to outdated branches. Yet as I said in the previous comment, I think more consistent names would help.

@jmandavilli
Copy link
Contributor

Thank you for taking care of that @PSSF23! We can change the file names once Will confirms. Thank you guys for all of the help.

@levinwil
Copy link
Collaborator

I agree with @PSSF23: the tutorial names should be consistent. As for if you want the prefix to be oblique or sporf - that is up to you. I don’t have a strong opinion either way.

@levinwil
Copy link
Collaborator

@jmandavilli @parthgvora Have you all renamed the notebooks?

@jmandavilli
Copy link
Contributor

@jmandavilli @parthgvora Have you all renamed the notebooks?

Hey, I'm really sorry I have a final in a couple of hours. I will rename the notebooks right after

@jmandavilli
Copy link
Contributor

jmandavilli commented Dec 15, 2020

Just renamed the notebooks, sorry for the delay. I updated tutorials.rst as well.

Let us know if there is anything else. Thanks again.

PSSF23 and others added 2 commits December 14, 2020 19:38
…ecision_boundaries_functions.py

Co-Authored-By: jmandav1 <[email protected]>
Co-Authored-By: parthgvora <[email protected]>
Co-Authored-By: jmandav1 <[email protected]>
Co-Authored-By: parthgvora <[email protected]>
@jmandavilli
Copy link
Contributor

Thanks for catching that Hao. My bad.

@PSSF23
Copy link
Member

PSSF23 commented Dec 15, 2020

All of my concerns were addressed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants