Multiclass Brier score seems incorrect #600

sibipx · 2022-01-28T15:50:04Z

Multiclass Brier score seems incorrect. It is though correct for 2 class (binary variables). I compare the results to multiclass.Brier from measures package, which respects the original definition of Brier (https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml).

The problem is not the division by 2 (or number of classes) compared to the original definition.

I would appreciate if you could let me know if this is by intention (another flavour of Brier score?) or not. I looked the the Cpp code for the calculation, and it seemed to me to be built with binary Brier score in mind - but I am not mastering C++.

Thanks!

library(ranger)
# use multiclass.Brier from library measures for comparison
library(measures)
# use diamonds dataset from ggplot2
library(ggplot2)

# on iris data
data(iris)
RF <- ranger(Species ~ ., data = iris, probability = TRUE)

RF$prediction.error
multiclass.Brier(RF$predictions, iris$Species) / length(levels(iris$Species))
multiclass.Brier(RF$predictions, iris$Species) 

# on diamonds data
data("diamonds")
RF <- ranger(cut ~ ., data = diamonds, probability = TRUE)

RF$prediction.error
multiclass.Brier(RF$predictions, diamonds$cut) / length(levels(diamonds$cut))
multiclass.Brier(RF$predictions, diamonds$cut)

# on binary (iris)
data(iris)
iris$Species <- factor(ifelse(iris$Species == "setosa", "setosa", "other"))
RF <- ranger(Species ~ ., data = iris, probability = TRUE)

RF$prediction.error
multiclass.Brier(RF$predictions, iris$Species) / length(levels(iris$Species))
# OK

# on binary(diamonds)
data("diamonds")
diamonds$cut <- factor(ifelse(diamonds$cut == "Ideal", "Ideal", "other"))
RF <- ranger(cut ~ ., data = diamonds, probability = TRUE)

RF$prediction.error
multiclass.Brier(RF$predictions, diamonds$cut) / length(levels(diamonds$cut))
# OK

The text was updated successfully, but these errors were encountered:

mnwright · 2022-03-03T08:31:02Z

Good point, thanks.

We are calculating:

$\frac{1}{n} \sum_{i=1}^{n} (1 - p_i)^2$

where p_i is the probability of the true class of observation i. That is equivalent to the binary Brier score but not to the multiclass Brier score implemented in measures. They calculate:

$\frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{k} (y_{ij} - p_{ij})^2$

where y_ij = 1 if observation i has class j (else 0), and p_ij is the predicted probability of observation i for class j (from their doc).

So, yes, we have another flavour of the Brier score. The advantage of our variant is that we have the same definition for binary and multiclass Brier score. The disadvantages are that (I think) it's not a strictly proper scoring rule. And that everyone else seems to be using the other variant (including Wikipedia and mlr3measures).

In summary, I'm inclined to change it to use the binary Brier score (this one) for binary classification and the multiclass Brier score (this one) for multiclass. Any other opinions?

sibipx · 2022-03-04T18:14:16Z

That looks just right! The mbrier in mlr3measures agrees with the original definition from the 1950 paper. Thanks!
PS: no urgency for me to fix it, but maybe other people might benefit from the fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiclass Brier score seems incorrect #600

Multiclass Brier score seems incorrect #600

sibipx commented Jan 28, 2022

mnwright commented Mar 3, 2022

sibipx commented Mar 4, 2022

Multiclass Brier score seems incorrect #600

Multiclass Brier score seems incorrect #600

Comments

sibipx commented Jan 28, 2022

mnwright commented Mar 3, 2022

sibipx commented Mar 4, 2022