Bagging for data valuation¶
+This notebook introduces the Data-OOB method, an implementation based on a publication from Kwon and Zou "Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value" ICML 2023 , using pyDVL.
+The objective of this paper is mainly to overcome the computational bottleneck of shapley-based data valuation methods that require to fit a significant number of models to accurately estimate marginal contributions. +The algorithms computes data values from out of bag estimates using a bagging model.
+The value can be interpreted as a partition of the OOB estimate, which is originally introduced to estimate the prediction error. This OOB estimate is given as:
+Setup¶
+We begin by importing the main libraries and setting some defaults.
+Variance¶
+The variance it the weak learner variance. It is computed with Welford's online algorithm.
+Point removal experiments¶
+The standard procedure for the evaluation of data valuation schemes is the point removal experiment. The objective is to measure the evolution of performance when the best/worst points are removed from the training set.
++
+ Created: + 2023-09-12 + + + +