You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We analyzed PFI using the model_parts option in R DALEX.
We built a classification model, however, we selected the loss_root_mean_square option in the loss_function argument due to the large number of features (182) despite the small number of total instances (117).
It seems that too many negative values of RMSE were observed. We attached the results, scripts and the data (zip file).
We would like to know why such data is being output.
There are a couple of suboptimal things in your modeling process. Most of them are related to be inconsistent regarding loss functions: Random forest probably optimizes Gini, grid search optimizes ROC AUC, perm imp studies RMSE. A neat workflow would be to use three times the same loss. With ranger backend for the random forest, you can minimize logloss during fitting, during grid search, and also during permutation importance. (This is the loss also used by a logistic regression). With "rf" backend, the loss is probably fixed, so one inconsistency will remain.
Hello!
We analyzed PFI using the model_parts option in R DALEX.
We built a classification model, however, we selected the loss_root_mean_square option in the loss_function argument due to the large number of features (182) despite the small number of total instances (117).
It seems that too many negative values of RMSE were observed. We attached the results, scripts and the data (zip file).
We would like to know why such data is being output.
Here is the code.
library(tidymodels)
library(DALEX)
library(caret)
library(breakDown)
library(ggrepel)
count_score_L <- read.delim("C:/rfolder/20231118_test/ensemble_count_learning.txt")
count_score_T <- read.delim("C:/rfolder/20231118_test/ensemble_count_test.txt")
count_score_L$which_g <- as.factor(count_score_L$which_g)
count_score_T$which_g <- as.factor(count_score_T$which_g)
ctrl <- trainControl(method = "repeatedcv",
number = 10, repeats = 10,
selectionFunction = "best",
savePredictions = TRUE,
classProbs = TRUE,
summaryFunction = twoClassSummary
)
classif_rf <- train(which_g ~ ., data = count_score_L, method = "rf",
metric = "ROC", trControl = ctrl
)
p_fun <- function(object, newdata){predict(object, newdata=newdata, type="prob")[,2]}
count_score_T$which_g <- as.numeric(ifelse(count_score_T$which_g == "group_A", 1, 0))
yTest <- as.numeric(count_score_T$which_g)
explainer_classif_rf <- DALEX::explain(classif_rf, label = "rf",
data = count_score_T, y = yTest,
predict_function = p_fun,
verbose = FALSE
)
pfi_classif_rf <- explainer_classif_rf %>%
model_parts(
loss_function = loss_root_mean_square,
B = 5,
type = "difference"
)
plot(pfi_classif_rf)
20231118_test.zip
The text was updated successfully, but these errors were encountered: