From fdc4a0653a5ebc4907adf40a6399244a2ada337b Mon Sep 17 00:00:00 2001 From: Markus Semmler Date: Wed, 11 Oct 2023 00:51:57 +0200 Subject: [PATCH] Add noise removal to README.md. --- docs/value/classwise-shapley.md | 107 ++++++++++++------ .../img/classwise-shapley-metric-auc-cv.svg | 1 + .../img/classwise-shapley-metric-auc-mean.svg | 1 + ...vg => classwise-shapley-metric-wad-cv.svg} | 0 ... => classwise-shapley-metric-wad-mean.svg} | 2 +- ...se-shapley-roc-auc-logistic-regression.svg | 1 + 6 files changed, 78 insertions(+), 34 deletions(-) create mode 100644 docs/value/img/classwise-shapley-metric-auc-cv.svg create mode 100644 docs/value/img/classwise-shapley-metric-auc-mean.svg rename docs/value/img/{classwise-shapley-metric-mlp-cv.svg => classwise-shapley-metric-wad-cv.svg} (100%) rename docs/value/img/{classwise-shapley-metric-mlp-mean.svg => classwise-shapley-metric-wad-mean.svg} (99%) create mode 100644 docs/value/img/classwise-shapley-roc-auc-logistic-regression.svg diff --git a/docs/value/classwise-shapley.md b/docs/value/classwise-shapley.md index 4b4fc1f0f..42f6d2764 100644 --- a/docs/value/classwise-shapley.md +++ b/docs/value/classwise-shapley.md @@ -4,12 +4,14 @@ title: Class-wise Shapley # Class-wise Shapley +## AlgorIntroductionithm + Class-wise Shapley (CWS) [@schoch_csshapley_2022] offers a Shapley framework tailored for classification problems. Let $D$ be a dataset, $D_{y_i}$ be the subset of $D$ with labels $y_i$, and $D_{-y_i}$ be the complement of $D_{y_i}$ in $D$. The key idea is that a sample $(x_i, y_i)$ might enhance the overall performance on $D$, while being detrimental for the performance on $D_{y_i}$. To -address this issue, the authors introduce the estimator +address this issue, the authors introduced $$ v_u(i) = \frac{1}{2^{|D_{-y_i}|}} \sum_{S_{-y_i}} @@ -54,7 +56,7 @@ the dataset. ``` -## Class-wise scorer +### Class-wise scorer In order to use the classwise Shapley value, one needs to define a [ClasswiseScorer][pydvl.value.shapley.classwise.ClasswiseScorer]. Given a sample @@ -88,12 +90,12 @@ and $g$ for an exploration with different base scores. ) ``` -The level curves for $f(x)=x$ and $g(x)=e^x$ are depicted below. The white lines +The level curves for $f(x)=x$ and $g(x)=e^x$ are depicted below. The lines illustrate the contour lines, annotated with their respective gradients. ![Level curves of the class-wise utility](img/classwise-shapley-discounted-utility-function.svg){ align=left width=33% class=invertible } -# Evaluation +## Evaluation We evaluate the method on the nine datasets used in [@schoch_csshapley_2022], using the same pre-processing. For images, PCA is used to reduce down to 32 the @@ -113,13 +115,15 @@ pre-processing steps, please refer to the paper. | MNIST (binary) | Image | 2 | 32 | 554 | | MNIST (multi) | Image | 10 | 32 | 554 | -## Performance for (direct) point removal +### Performance for (direct) point removal We compare the mean and the coefficient of variation (CV) of the weighted accuracy drop (WAD) as proposed in [@schoch_csshapley_2022]. The metric is defined by $$ -\text{WAD} = a_T(D) - \sum_{j=1}^{n} \frac{a_{T_{-\{1 \colon j \}}}(D)}{j}, +\text{WAD} = \sum_{j=1}^{n} \left ( \frac{1}{j} \sum_{i=1}^{j} +a_{T_{-\{1 \colon i-1 \}}}(D) - a_{T_{-\{1 \colon i \}}}(D) \right) += a_T(D) - \sum_{j=1}^{n} \frac{a_{T_{-\{1 \colon j \}}}(D)}{j} , $$ where $a_T(D)$ is the accuracy of the model (trained on $T$) evaluated on $D$ and @@ -129,15 +133,15 @@ standard deviation $\sigma_\text{WAD}$. The valuation of the training samples an evaluation on the validation samples are both calculated based on a logistic regression model. Let's have a look at the mean -![WAD drop (Mean)](img/classwise-shapley-metric-mlp-mean.svg){ align=left width=50% class=invertible } +![Weighted accuracy drop (Mean)](img/classwise-shapley-metric-wad-mean.svg){ align=left width=50% class=invertible } of the metric WAD. The table shows that CWS is competitive with all three other methods. In all problems except `MNIST (multi)` it is better than TMCS, whereas in that case TMCS has a slight advantage. Another important quantity is the CV $\frac{\sigma_\text{WAD}}{\mu_\text{WAD}}$. It normalizes the standard -deviation relative to the mean. The results are shown below. +deviation by the mean. The results are shown below. -![WAD drop (CV)](img/classwise-shapley-metric-mlp-cv.svg){ align=left width=50% class=invertible } +![Weighted accuracy drop (CV)](img/classwise-shapley-metric-wad-cv.svg){ align=left width=50% class=invertible } It is noteworthy that CWS is not the best method in terms of CV (Lower CV means better performance). For `CIFAR10`, `Click`, `CPU` and `MNIST (binary)` Beta Shapley has the @@ -146,20 +150,24 @@ lowest CV. For `Diabetes`, `MNIST (multi)` and `Phoneme` CWS is the winner and f highest relative standard deviation. The following plot shows valuation-set accuracy of logistic regression on the y-axis. -The x-axis shows the number of samples removed. Random values serve as a baseline. +The x-axis shows the number of samples removed. Random values serve as a baseline. +Each line represents five runs, whereas bootstrapping was used to estimate the 95% +confidence intervals. + ![Accuracy after sample removal using values from logistic regression](img/classwise-shapley-weighted-accuracy-drop-logistic-regression-to-logistic-regression.svg){ class=invertible } -Overall we conclude that in terms of mean WAD CWS and TMCS are the best methods. In -terms of the CV CWS and Beta Shapley are the clear winners. Hence, CWS is a competitive -method for valuation of data sets with a low relative standard deviation. We remark that -for all valuation methods the same number of _evaluations of the marginal utility_ was -used. +Samples are removed from high to low valuation order and hence we expect a steep +decrease in the curve. Overall we conclude that in terms of mean WAD CWS and TMCS are +the best methods. In terms of CV, CWS and Beta Shapley are the clear winners. Hence, CWS +is a competitive method for valuation of data sets with a low relative standard +deviation. We remark that for all valuation methods the same number of _evaluations of +the marginal utility_ was used. -## Performance in value transfer for point removal +### Performance in value transfer for point removal -Practically more relevant is transfer of values for one model to another one. As before -the values are calculated using a logistic regression model. However, here they are +Practically more relevant is the transfer of values from one model to another one. As +before the values are calculated using logistic regression. However, this time they are used to prune the training set for a neural network. The following plot shows valuation-set accuracy of the network on the y-axis, and the number of samples removed on the x-axis. @@ -167,31 +175,64 @@ on the x-axis. ![Accuracy after sample removal using values transferred from logistic regression to an MLP](img/classwise-shapley-weighted-accuracy-drop-logistic-regression-to-mlp.svg){ class=invertible } -Samples are removed from high to low valuation order and hence we expect a steep +Again samples are removed from high to low valuation order and hence we expect a steep decrease in the curve. CWS is competitive with the compared methods. Especially in very unbalanced datasets, like `Click`, the performance of CWS seems superior. In other datasets, like `Covertype` and `Diabetes` and `MNIST (multi)` the performance is on par with TMC. For `MNIST (binary)` and `Phoneme` the -performance is competitive. We remark that for all valuation methods the -same number of _evaluations of the marginal utility_ was used. +performance is competitive. -## Density of values +### Density of values -Last but not least let's compare the distribution of values for TMCS (green) and CWS -(red). Therefore, the following plots show a histogram the density estimated by kernel -density estimation (KDE). +This experiment compares the distribution of values for TMCS (green) and CWS +(red). Both methods are chosen due to their competivieness. The following plots show a +histogram as well as the density estimated by kernel density estimation (KDE). ![Density of TMCS and CWS](img/classwise-shapley-density.svg){ class=invertible } -As the metrics already suggest TMCS has a higher variance as CWS. In mean, they -seem to approximate the same quantity, which is not obvious due to their different -nature of their utility functions. +As apparent in the metric CV from the previous section, the variance of CWS is lower +than for TCMS. They seem to approximate the same form of distribution, although their +utility functions are different. For `Click` TMCS has a multi-modal distribution of values. This is inferior to CWS which has only one-mode and is more stable on that dataset. `Click` is a very unbalanced -dataset and hence CWS seems to be more robust on unbalanced datasets. It seems that -CWS is a good way to handle classification problems. Given the underlying similarities -in the architecture of TMCS, Beta Shapley, and CWS algorithms, there's a clear pathway -for improving convergence rates, sample efficiency, and stabilize variance in all of -these methods. \ No newline at end of file +dataset, and we conclude that CWS seems to be more robust on unbalanced datasets. + +### Noise removal for 20% of the flipped data + +Another type of experiment uses the algorithms to explore mis-labelled data points. The +indices are chosen randomly. Multi-class datasets are discarded, because they do not +possess a unique flipping strategy. The following table shows the mean of the area under +the curve (AUC) of five runs. + +![Area under the Curve (Mean)](img/classwise-shapley-metric-auc-mean.svg){ align=left width=50% class=invertible } + +In the majority of the cases TMCS has a slight advantage over CWS on average. For +`Click` CWS has a slight edge, most probably due to the unbalanced nature of `Click`. +The following plot shows the CV for the AUC of the five runs. + +![Area under the Curve (CV)](img/classwise-shapley-metric-auc-cv.svg){ align=left width=50% class=invertible } + +In terms of CV, CWS has a clear edge over TMCS and Beta Shapley. The following plot +shows the receiving operator characteristic (ROC) for the mean of five runs. + +![Receiver Operating Characteristic](img/classwise-shapley-roc-auc-logistic-regression.svg){ align=left width=50% class=invertible } + +The ROC curve is a plot of the true positive rate (TPR) against the false positive rate +(FPR). The TPR is the ratio of correctly classified positive samples to all positive +samples. The FPR is the ratio of incorrectly classified negative samples to all negative +samples. This tuple is calculated for all prefixes of the training set with respect to +the values. Although it seems that TMCS is the winner, considering sample efficiency, +CWS stays competitive. For a perfectly balanced dataset, CWS needs fewer samples than +TCMS on average. CWS is competitive and almost on par with TCMS, while requring less +samples on average. + +## Conclusion + +CWS is a reasonable and effective way to handle classification problems. It reduces the +computing power and variance by splitting up the data set into classes. Given the +underlying similarities in the architecture of TMCS, Beta Shapley, and CWS, there's a +clear pathway for improving convergence rates, sample efficiency, and stabilize variance +for TMCS and Beta Shapley. + diff --git a/docs/value/img/classwise-shapley-metric-auc-cv.svg b/docs/value/img/classwise-shapley-metric-auc-cv.svg new file mode 100644 index 000000000..3ddc5f5a4 --- /dev/null +++ b/docs/value/img/classwise-shapley-metric-auc-cv.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/value/img/classwise-shapley-metric-auc-mean.svg b/docs/value/img/classwise-shapley-metric-auc-mean.svg new file mode 100644 index 000000000..197ada82b --- /dev/null +++ b/docs/value/img/classwise-shapley-metric-auc-mean.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/value/img/classwise-shapley-metric-mlp-cv.svg b/docs/value/img/classwise-shapley-metric-wad-cv.svg similarity index 100% rename from docs/value/img/classwise-shapley-metric-mlp-cv.svg rename to docs/value/img/classwise-shapley-metric-wad-cv.svg diff --git a/docs/value/img/classwise-shapley-metric-mlp-mean.svg b/docs/value/img/classwise-shapley-metric-wad-mean.svg similarity index 99% rename from docs/value/img/classwise-shapley-metric-mlp-mean.svg rename to docs/value/img/classwise-shapley-metric-wad-mean.svg index 6dc40687f..7f74a384a 100644 --- a/docs/value/img/classwise-shapley-metric-mlp-mean.svg +++ b/docs/value/img/classwise-shapley-metric-wad-mean.svg @@ -1 +1 @@ - \ No newline at end of file + \ No newline at end of file diff --git a/docs/value/img/classwise-shapley-roc-auc-logistic-regression.svg b/docs/value/img/classwise-shapley-roc-auc-logistic-regression.svg new file mode 100644 index 000000000..0ec200f83 --- /dev/null +++ b/docs/value/img/classwise-shapley-roc-auc-logistic-regression.svg @@ -0,0 +1 @@ + \ No newline at end of file