Skip to content

Commit

Permalink
Add noise removal to README.md.
Browse files Browse the repository at this point in the history
  • Loading branch information
Markus Semmler committed Oct 10, 2023
1 parent 37efbb8 commit fdc4a06
Show file tree
Hide file tree
Showing 6 changed files with 78 additions and 34 deletions.
107 changes: 74 additions & 33 deletions docs/value/classwise-shapley.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,14 @@ title: Class-wise Shapley

# Class-wise Shapley

## AlgorIntroductionithm

Class-wise Shapley (CWS) [@schoch_csshapley_2022] offers a Shapley framework
tailored for classification problems. Let $D$ be a dataset, $D_{y_i}$ be the
subset of $D$ with labels $y_i$, and $D_{-y_i}$ be the complement of $D_{y_i}$
in $D$. The key idea is that a sample $(x_i, y_i)$ might enhance the overall
performance on $D$, while being detrimental for the performance on $D_{y_i}$. To
address this issue, the authors introduce the estimator
address this issue, the authors introduced

$$
v_u(i) = \frac{1}{2^{|D_{-y_i}|}} \sum_{S_{-y_i}}
Expand Down Expand Up @@ -54,7 +56,7 @@ the dataset.
```


## Class-wise scorer
### Class-wise scorer

In order to use the classwise Shapley value, one needs to define a
[ClasswiseScorer][pydvl.value.shapley.classwise.ClasswiseScorer]. Given a sample
Expand Down Expand Up @@ -88,12 +90,12 @@ and $g$ for an exploration with different base scores.
)
```

The level curves for $f(x)=x$ and $g(x)=e^x$ are depicted below. The white lines
The level curves for $f(x)=x$ and $g(x)=e^x$ are depicted below. The lines
illustrate the contour lines, annotated with their respective gradients.

![Level curves of the class-wise utility](img/classwise-shapley-discounted-utility-function.svg){ align=left width=33% class=invertible }

# Evaluation
## Evaluation

We evaluate the method on the nine datasets used in [@schoch_csshapley_2022],
using the same pre-processing. For images, PCA is used to reduce down to 32 the
Expand All @@ -113,13 +115,15 @@ pre-processing steps, please refer to the paper.
| MNIST (binary) | Image | 2 | 32 | 554 |
| MNIST (multi) | Image | 10 | 32 | 554 |

## Performance for (direct) point removal
### Performance for (direct) point removal

We compare the mean and the coefficient of variation (CV) of the weighted accuracy drop
(WAD) as proposed in [@schoch_csshapley_2022]. The metric is defined by

$$
\text{WAD} = a_T(D) - \sum_{j=1}^{n} \frac{a_{T_{-\{1 \colon j \}}}(D)}{j},
\text{WAD} = \sum_{j=1}^{n} \left ( \frac{1}{j} \sum_{i=1}^{j}
a_{T_{-\{1 \colon i-1 \}}}(D) - a_{T_{-\{1 \colon i \}}}(D) \right)
= a_T(D) - \sum_{j=1}^{n} \frac{a_{T_{-\{1 \colon j \}}}(D)}{j} ,
$$

where $a_T(D)$ is the accuracy of the model (trained on $T$) evaluated on $D$ and
Expand All @@ -129,15 +133,15 @@ standard deviation $\sigma_\text{WAD}$. The valuation of the training samples an
evaluation on the validation samples are both calculated based on a logistic regression
model. Let's have a look at the mean

![WAD drop (Mean)](img/classwise-shapley-metric-mlp-mean.svg){ align=left width=50% class=invertible }
![Weighted accuracy drop (Mean)](img/classwise-shapley-metric-wad-mean.svg){ align=left width=50% class=invertible }

of the metric WAD. The table shows that CWS is competitive with all three other methods.
In all problems except `MNIST (multi)` it is better than TMCS, whereas in that
case TMCS has a slight advantage. Another important quantity is the CV
$\frac{\sigma_\text{WAD}}{\mu_\text{WAD}}$. It normalizes the standard
deviation relative to the mean. The results are shown below.
deviation by the mean. The results are shown below.

![WAD drop (CV)](img/classwise-shapley-metric-mlp-cv.svg){ align=left width=50% class=invertible }
![Weighted accuracy drop (CV)](img/classwise-shapley-metric-wad-cv.svg){ align=left width=50% class=invertible }

It is noteworthy that CWS is not the best method in terms of CV (Lower CV means better
performance). For `CIFAR10`, `Click`, `CPU` and `MNIST (binary)` Beta Shapley has the
Expand All @@ -146,52 +150,89 @@ lowest CV. For `Diabetes`, `MNIST (multi)` and `Phoneme` CWS is the winner and f
highest relative standard deviation.

The following plot shows valuation-set accuracy of logistic regression on the y-axis.
The x-axis shows the number of samples removed. Random values serve as a baseline.
The x-axis shows the number of samples removed. Random values serve as a baseline.
Each line represents five runs, whereas bootstrapping was used to estimate the 95%
confidence intervals.


![Accuracy after sample removal using values from logistic regression](img/classwise-shapley-weighted-accuracy-drop-logistic-regression-to-logistic-regression.svg){ class=invertible }

Overall we conclude that in terms of mean WAD CWS and TMCS are the best methods. In
terms of the CV CWS and Beta Shapley are the clear winners. Hence, CWS is a competitive
method for valuation of data sets with a low relative standard deviation. We remark that
for all valuation methods the same number of _evaluations of the marginal utility_ was
used.
Samples are removed from high to low valuation order and hence we expect a steep
decrease in the curve. Overall we conclude that in terms of mean WAD CWS and TMCS are
the best methods. In terms of CV, CWS and Beta Shapley are the clear winners. Hence, CWS
is a competitive method for valuation of data sets with a low relative standard
deviation. We remark that for all valuation methods the same number of _evaluations of
the marginal utility_ was used.

## Performance in value transfer for point removal
### Performance in value transfer for point removal

Practically more relevant is transfer of values for one model to another one. As before
the values are calculated using a logistic regression model. However, here they are
Practically more relevant is the transfer of values from one model to another one. As
before the values are calculated using logistic regression. However, this time they are
used to prune the training set for a neural network. The following plot shows
valuation-set accuracy of the network on the y-axis, and the number of samples removed
on the x-axis.

![Accuracy after sample removal using values transferred from logistic regression
to an MLP](img/classwise-shapley-weighted-accuracy-drop-logistic-regression-to-mlp.svg){ class=invertible }

Samples are removed from high to low valuation order and hence we expect a steep
Again samples are removed from high to low valuation order and hence we expect a steep
decrease in the curve. CWS is competitive with the compared methods. Especially
in very unbalanced datasets, like `Click`, the performance of CWS seems
superior. In other datasets, like `Covertype` and `Diabetes` and `MNIST (multi)`
the performance is on par with TMC. For `MNIST (binary)` and `Phoneme` the
performance is competitive. We remark that for all valuation methods the
same number of _evaluations of the marginal utility_ was used.
performance is competitive.

## Density of values
### Density of values

Last but not least let's compare the distribution of values for TMCS (green) and CWS
(red). Therefore, the following plots show a histogram the density estimated by kernel
density estimation (KDE).
This experiment compares the distribution of values for TMCS (green) and CWS
(red). Both methods are chosen due to their competivieness. The following plots show a
histogram as well as the density estimated by kernel density estimation (KDE).


![Density of TMCS and CWS](img/classwise-shapley-density.svg){ class=invertible }

As the metrics already suggest TMCS has a higher variance as CWS. In mean, they
seem to approximate the same quantity, which is not obvious due to their different
nature of their utility functions.
As apparent in the metric CV from the previous section, the variance of CWS is lower
than for TCMS. They seem to approximate the same form of distribution, although their
utility functions are different.

For `Click` TMCS has a multi-modal distribution of values. This is inferior to CWS which
has only one-mode and is more stable on that dataset. `Click` is a very unbalanced
dataset and hence CWS seems to be more robust on unbalanced datasets. It seems that
CWS is a good way to handle classification problems. Given the underlying similarities
in the architecture of TMCS, Beta Shapley, and CWS algorithms, there's a clear pathway
for improving convergence rates, sample efficiency, and stabilize variance in all of
these methods.
dataset, and we conclude that CWS seems to be more robust on unbalanced datasets.

### Noise removal for 20% of the flipped data

Another type of experiment uses the algorithms to explore mis-labelled data points. The
indices are chosen randomly. Multi-class datasets are discarded, because they do not
possess a unique flipping strategy. The following table shows the mean of the area under
the curve (AUC) of five runs.

![Area under the Curve (Mean)](img/classwise-shapley-metric-auc-mean.svg){ align=left width=50% class=invertible }

In the majority of the cases TMCS has a slight advantage over CWS on average. For
`Click` CWS has a slight edge, most probably due to the unbalanced nature of `Click`.
The following plot shows the CV for the AUC of the five runs.

![Area under the Curve (CV)](img/classwise-shapley-metric-auc-cv.svg){ align=left width=50% class=invertible }

In terms of CV, CWS has a clear edge over TMCS and Beta Shapley. The following plot
shows the receiving operator characteristic (ROC) for the mean of five runs.

![Receiver Operating Characteristic](img/classwise-shapley-roc-auc-logistic-regression.svg){ align=left width=50% class=invertible }

The ROC curve is a plot of the true positive rate (TPR) against the false positive rate
(FPR). The TPR is the ratio of correctly classified positive samples to all positive
samples. The FPR is the ratio of incorrectly classified negative samples to all negative
samples. This tuple is calculated for all prefixes of the training set with respect to
the values. Although it seems that TMCS is the winner, considering sample efficiency,
CWS stays competitive. For a perfectly balanced dataset, CWS needs fewer samples than
TCMS on average. CWS is competitive and almost on par with TCMS, while requring less
samples on average.

## Conclusion

CWS is a reasonable and effective way to handle classification problems. It reduces the
computing power and variance by splitting up the data set into classes. Given the
underlying similarities in the architecture of TMCS, Beta Shapley, and CWS, there's a
clear pathway for improving convergence rates, sample efficiency, and stabilize variance
for TMCS and Beta Shapley.

1 change: 1 addition & 0 deletions docs/value/img/classwise-shapley-metric-auc-cv.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/value/img/classwise-shapley-metric-auc-mean.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit fdc4a06

Please sign in to comment.