From 09e750f4db4dc46d33d00ae3011643bd96190236 Mon Sep 17 00:00:00 2001 From: Tiffany Tang Date: Wed, 16 Aug 2023 10:08:22 -0400 Subject: [PATCH] add mdi+ to readme --- readme.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/readme.md b/readme.md index f8ddac9..debc69c 100644 --- a/readme.md +++ b/readme.md @@ -1,11 +1,11 @@


- Scripts for easily comparing different aspects of the imodels package. Contains code to reproduce FIGS + Hierarchical shrinkage + G-FIGS. + Scripts for easily comparing different aspects of the imodels package. Contains code to reproduce FIGS + Hierarchical shrinkage + G-FIGS + MDI+.

# Documentation -Follow these steps to benchmark a new (supervised) model. If you want to benchmark something like feature importance or unsupervised learning, you will have to make more substantial changes (mostly in `01_fit_models.py`) +Follow these steps to benchmark a new (supervised) model. 1. Write the sklearn-compliant model (init, fit, predict, predict_proba for classifiers) and add it somewhere in a local folder or in `imodels` 2. Update configs - create a new folder mimicking an existing folder (e.g. `config.interactions`) @@ -21,6 +21,8 @@ Follow these steps to benchmark a new (supervised) model. If you want to benchma 5. put scripts/notebooks into a subdirectory of the `notebooks` folder (e.g. `notebooks/interactions`) +Note: If you want to benchmark feature importances, go to [feature_importance/](https://github.com/Yu-Group/imodels-experiments/tree/master/feature_importance). For benchmarking other tasks such as unsupervised learning, you will have to make more substantial changes (mostly in `01_fit_models.py`). + ## Config - When running multiple seeds, we want to aggregate over all keys that are not the split_seed - If a hyperparameter is not passed in `ModelConfig` (e.g. because we are using parial), it cannot be aggregated over seeds later on @@ -77,3 +79,10 @@ Machine learning in high-stakes domains, such as healthcare, faces two critical

G-FIGS 2-step process explained.

+ + +### MDI+: A Flexible Random Forest-Based Feature Importance Framework + +[📄 Paper](https://arxiv.org/pdf/2307.01932.pdf), [📌 Citation](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C23&q=MDI%2B%3A+A+Flexible+Random+Forest-Based+Feature+Importance+Framework&btnG=#d=gs_cit&t=1690399844081&u=%2Fscholar%3Fq%3Dinfo%3Axc0LcHXE_lUJ%3Ascholar.google.com%2F%26output%3Dcite%26scirp%3D0%26hl%3Den) + +MDI+ is a novel feature importance framework, which generalizes the popular mean decrease in impurity (MDI) importance score for random forests. At its core, MDI+ expands upon a recently discovered connection between linear regression and decision trees. In doing so, MDI+ enables practitioners to (1) tailor the feature importance computation to the data/problem structure and (2) incorporate additional features or knowledge to mitigate known biases of decision trees. In both real data case studies and extensive real-data-inspired simulations, MDI+ outperforms commonly used feature importance measures (e.g., MDI, permutation-based scores, and TreeSHAP) by substantional margins.