-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different Scale of SHAP values for Approx vs. ExactSHAP #31
Comments
Hi @simonschoe, it would be helpful if you could post a reproducible example for me to run on my end. In general, the approximate method used by fastshap depends on the variance of the feature columns. Some problems will require more Monte Carlo reps (say, |
Hi @bgreenwell thanks for your reply - sorry for the delay... Unfortunately, it is difficult for me to provide a reproducible example since the entire workflow is predicated on proprietary data. What I can provide, however, is the following: shap_values_gbm <- fastshap::explain(
extract_fit_engine(final_fit_gbm),
X = X_gbm,
pred_wrapper = function(object, newdata) predict(object, newdata),
exact = T,
newdata = NULL,
.parallel = T
)
shap_values_gbm2 <- fastshap::explain(
extract_fit_engine(final_fit_gbm),
X = X_gbm,
pred_wrapper = function(object, newdata) predict(object, newdata),
nsim = 1000, adjust = T,
newdata = NULL,
.parallel = T
) These are the two snippets that run TreeSHAP and ApproxSHAP on my machine, respectively. The resulting top 10 rankings look as follows (the code in between the computation of Hope that this may provide some context as to why/how the difference occurs? Best, Simon |
@simonschoe The only thing I can think of is the scale on which the Shapley values are being returned in each approach. For example, in a binary outcome in a GLM, Shapley values could be returned on the link or response scale. The |
@bgreenwell But if it is simply a scaling issue shouldnt I at least obtain somewhat similar rank orderings and the same sign of the effect? In the above example, despite running 1k simulations, the two rankings are still so different from each other... |
Hi there,
great work with the package first and foremost.
Quick question: Does the ApproxSHAP method scale or standardize SHAP values in any way? When I create global feature attribution rankings for a GBM using Approx as well as TreeSHAP, my SHAP values end up being on substantially different scale. For example, using ApproxSHAP the mean absolute values are in the range of 0.01-0.18 while they lie between 1-14 using TreeSHAP.
Thanks in advance!
The text was updated successfully, but these errors were encountered: