-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partial() not working correctly for H2O GLM #127
Comments
Apologies, I found my mistake. I didn't account for the fact that, compared to GBMs, you get standard errors with H2O GLM predictions. Selecting only the first
|
Thanks for the note @RoelVerbelen, glad you found the issue! The bigger bottleneck here is probably attributed to the multiple calls needed to coerce the data with # Load required packages
library(dplyr)
library(pdp)
library(sparklyr)
data(boston, package = "pdp")
sc <- spark_connect(master = 'local')
boston_sc <- copy_to(sc, boston, overwrite = TRUE)
rfo <- boston_sc %>% ml_random_forest(cmedv ~ ., type = "auto")
# Define plotting grid
df1 <- data.frame(lstat = quantile(boston$lstat, probs = 1:19/20)) %>%
copy_to(sc, df = .)
# Remove plotting variable from training data
df2 <- boston %>%
select(-lstat) %>%
copy_to(sc, df = .)
# Perform a cross join, compute predictions, then aggregate
par_dep <- df1 %>%
full_join(df2, by = character()) %>% # cartesian product
ml_predict(rfo, dataset = .) %>%
group_by(lstat) %>%
summarize(yhat = mean(prediction)) %>% # average for partial dependence
select(lstat, yhat) %>% # select plotting variables
arrange(lstat) %>% # for plotting purposes
collect()
# Plot results
plot(par_dep, type = "l") You can try this with your h2o example. If you see a significant improvement, then I'd consider revisiting the idea! (Although, I think the original implementation used data.table to do the joins and aggregations). |
Hi @bgreenwell, thanks for the response! Absolutely, it's the conversion from R to H2O (using
I love how general pdp is, so having this |
Sorry for the delay @RoelVerbelen, that is a convincing example! I'll reopen the issue, but not sure when I'll get to it. Should be easy to resurrect the old branch and grab the code. |
Oddly, I'm not getting sensible results for GLMs using H2O. Effects for continuous factors are incorrectly looking quadratic.
Here's a simple reprex for a Poisson GLM:
Created on 2022-11-03 by the reprex package (v2.0.1)
I'm getting similar issues with Gausian and binomial GLMs. Using version 0.8.1 from CRAN.
The text was updated successfully, but these errors were encountered: