Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interpretability example notebooks #21

Open
wants to merge 37 commits into
base: obliquepr
Choose a base branch
from

Conversation

jshinm
Copy link

@jshinm jshinm commented May 26, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Add 3 interpretability example notebooks

  • Iris notebook
  • Simulation notebook
  • MNIST notebook

Any other comments?

@jshinm jshinm requested a review from adam2392 May 26, 2022 07:10
@jshinm jshinm self-assigned this May 26, 2022
Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the following changes are made:

  • 0. For simulation notebook: I would remove the gaussian circles and just focus on the sparse parity since that shows the most difference. and Remove max_feature=3*n_features
  • 1. notebook/iris_benchmark_OF_vs_RF.ipynb move the relevant OF part content into examples/tree/plot_iris_dtc.py.
  • 2. For simulation notebook: Add description on the sparse parity problem according to the reference I linked. Here is a paraphrased summary of what we want to say:3.

    Ref for sparse parity: https://epubs.siam.org/doi/epdf/10.1137/1.9781611974973.56
Sparse parity is a variation of the noisy parity problem, which itself is a multivariate generalization of the noisy XOR problem. This is a binary classification task in high dimensions. 

<describe sparse parity as done in the paper in more laymen terms>

<describe the intuition for why OF would be better than RF>
e.g. OF should be more robust to high-dimensional noise. Moreover, due to the ability to sample more variable splits (i.e. `max_features` can be greater than `n_features` compared to RF), then we expect to see an increase in performance when we are willing to use computational power to sample more splits.

...

Ideally we can try to have this done by Friday so we can show these to sklearn devs at OH on Monday. If you can't have this done by then (I know you have a lot of stuff going on!), please let me know and I can help out so we can have things ready by Monday.

@jshinm
Copy link
Author

jshinm commented Jun 14, 2022

6/13/2022

TODOS:

  • add abs value plots in addition to delta plots
  • grid search parameters
    • test following parameters
      • n_estimator: range(100, 1000, 100)
      • max_depth: [None, 5, 10, 15, 20]
      • max_features: ['sqrt', 'log2', 1x mtry, 2x mtry]
  • add robustedness test over confusion matrix
  • performance metric (memory [depth, number of leaves] vs accuracy; protocol-5)
  • don't show 2x mtry for RF
  • use stripplot in addition to the box plot (at alpha=0.3-4)
    • fix double legend (currently legend disabled)
  • plot delta and abs plots separately

Additional refs from sklearn dev team

@adam2392
Copy link
Collaborator

For documentation that will get merged into the PR branch:

  • https://github.com/scikit-learn/scikit-learn/blob/main/doc/modules/tree.rst we should modify this to add a section on "Oblique Trees" with a summary of how they're different from regular decision trees and high level intuition on when they would be better vs not and trade-offs to be aware of in terms of fitting/score time and classifier size vs the performance.
  • under examples/ensemble/, we should add a file plot_oblique_axis_aligned_forests.py, which compares Oblique vs Random forests on a real dataset and perhaps a short version of the sparse-parity simulation. Ideally entire example can run under 30 seconds with RF and OF training. We can subsample the dataset if needed.

For the real datasets, we can use cnae-9 and phishing-websites and wdbc from openml, which seemed to have differing performances for OF and RF:

Ideally we can have some intuition on why RF vs OF is better in one of these...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants