Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiclass Brier score seems incorrect #600

Open
sibipx opened this issue Jan 28, 2022 · 2 comments
Open

Multiclass Brier score seems incorrect #600

sibipx opened this issue Jan 28, 2022 · 2 comments

Comments

@sibipx
Copy link

sibipx commented Jan 28, 2022

Multiclass Brier score seems incorrect. It is though correct for 2 class (binary variables). I compare the results to multiclass.Brier from measures package, which respects the original definition of Brier (https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml).

The problem is not the division by 2 (or number of classes) compared to the original definition.

I would appreciate if you could let me know if this is by intention (another flavour of Brier score?) or not. I looked the the Cpp code for the calculation, and it seemed to me to be built with binary Brier score in mind - but I am not mastering C++.

Thanks!

library(ranger)
# use multiclass.Brier from library measures for comparison
library(measures)
# use diamonds dataset from ggplot2
library(ggplot2)

# on iris data
data(iris)
RF <- ranger(Species ~ ., data = iris, probability = TRUE)

RF$prediction.error
multiclass.Brier(RF$predictions, iris$Species) / length(levels(iris$Species))
multiclass.Brier(RF$predictions, iris$Species) 

# on diamonds data
data("diamonds")
RF <- ranger(cut ~ ., data = diamonds, probability = TRUE)

RF$prediction.error
multiclass.Brier(RF$predictions, diamonds$cut) / length(levels(diamonds$cut))
multiclass.Brier(RF$predictions, diamonds$cut)

# on binary (iris)
data(iris)
iris$Species <- factor(ifelse(iris$Species == "setosa", "setosa", "other"))
RF <- ranger(Species ~ ., data = iris, probability = TRUE)

RF$prediction.error
multiclass.Brier(RF$predictions, iris$Species) / length(levels(iris$Species))
# OK

# on binary(diamonds)
data("diamonds")
diamonds$cut <- factor(ifelse(diamonds$cut == "Ideal", "Ideal", "other"))
RF <- ranger(cut ~ ., data = diamonds, probability = TRUE)

RF$prediction.error
multiclass.Brier(RF$predictions, diamonds$cut) / length(levels(diamonds$cut))
# OK
@mnwright
Copy link
Member

mnwright commented Mar 3, 2022

Good point, thanks.

We are calculating:

where p_i is the probability of the true class of observation i. That is equivalent to the binary Brier score but not to the multiclass Brier score implemented in measures. They calculate:

where y_ij = 1 if observation i has class j (else 0), and p_ij is the predicted probability of observation i for class j (from their doc).

So, yes, we have another flavour of the Brier score. The advantage of our variant is that we have the same definition for binary and multiclass Brier score. The disadvantages are that (I think) it's not a strictly proper scoring rule. And that everyone else seems to be using the other variant (including Wikipedia and mlr3measures).

In summary, I'm inclined to change it to use the binary Brier score (this one) for binary classification and the multiclass Brier score (this one) for multiclass. Any other opinions?

@sibipx
Copy link
Author

sibipx commented Mar 4, 2022

That looks just right! The mbrier in mlr3measures agrees with the original definition from the 1950 paper. Thanks!
PS: no urgency for me to fix it, but maybe other people might benefit from the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants