-
Notifications
You must be signed in to change notification settings - Fork 2
R 3: Evaluating the New Design Matrix and Interpreting via SHAP
Now, we can see if the extracted interactions really help our model. First, we load scikit-learn in R, which contains the logistic regression models we seek to fit, though you can run lm on these matrices just fine:
sklearn<-reticulate:::import("sklearn")
Let's fit a logistic regression model without the interactions, a random forest model and a logistic regression model with the interactions included as such:
lr.model<-sklearn$linear_model$LogisticRegression(random_state=42L,class_weight='balanced')$fit(train.test.splits$X.train,train.test.splits$y.train)
lr2.model<-sklearn$linear_model$LogisticRegression(random_state=42L,class_weight='balanced')$fit(X.train2,train.test.splits$y.train)
rf.model<-interactiontransformer$InteractionTransformer$BalancedRandomForestClassifier(random_state=42L)$fit(train.test.splits$X.train,train.test.splits$y.train)
Now what do you get when you calculate the AUROCs or C-statistics of these models? Find out here:
sklearn$metrics$roc_auc_score(train.test.splits$y.test,lr.model$predict_proba(train.test.splits$X.test)[,2])
sklearn$metrics$roc_auc_score(train.test.splits$y.test,lr2.model$predict_proba(X.test2)[,2])
sklearn$metrics$roc_auc_score(train.test.splits$y.test,rf.model$predict_proba(train.test.splits$X.test)[,2])
Given that we have fit these models, we notice that the interactions help boost the logistic regression model such that it can outperform random forest. The added ability here is to have features that are inherently more interpretable than running a random forest, thus we have sensibly extracted interactions. We have wrapped some SHAP visualization scripts for you to check out:
shap.results.lr<-run.shap(train.test.splits$X.train, train.test.splits$X.test, lr.model, model_type='linear', savefile='../test_data/epistasis.lr.shap.png')
shap.results.rf<-run.shap(train.test.splits$X.train, train.test.splits$X.test, rf.model, model_type='tree', savefile='../test_data/epistasis.rf.shap.png')
shap.results.lr2<-run.shap(X.train2, X.test2, lr2.model, model_type='linear', savefile='../test_data/epistasis.lr2.shap.png')
That is our quick demo for extending the model building capacity of your traditional modeling approach through the use of the InteractionTransformer. We hope that you find this tool to be helpful. We are always open to adding more functionality to the workflow, so if you have something in mind, please let us know in the Issues section. Thanks!