Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/filter converged #437

Merged
merged 23 commits into from
Oct 8, 2023
Merged

Feature/filter converged #437

merged 23 commits into from
Oct 8, 2023

Conversation

mdbenito
Copy link
Collaborator

@mdbenito mdbenito commented Sep 22, 2023

Description

This PR closes #303

Changes

  • Adds an additional parameter to compute_generic_semivalues that allows skipping samples for indices that have already converged
  • Improves documentation and warns about that parameter (it's tricky)

Unrelated:

  • Deprecates a custom attribute in StoppingCriterion in favour of __str__

Checklist

  • Wrote Unit tests (if necessary)
  • Updated Documentation (if necessary)
  • Updated Changelog
  • If notebooks were added/changed, added boilerplate cells are tagged with "tags": ["hide"] or "tags": ["hide-input"]

@mdbenito mdbenito requested a review from kosmitive September 22, 2023 17:25
@kosmitive
Copy link
Contributor

kosmitive commented Sep 22, 2023

Can you elaborate, why this parameter is necessary? Instead of doing that, we can also try to exclude marginal evaluations for converged indices, by removing them from the task queue.

@mdbenito
Copy link
Collaborator Author

mdbenito commented Sep 22, 2023

Can you elaborate, why this parameter is necessary? Instead of doing that, we can also try to exclude marginal evaluations for converged indices, by removing them from the task queue.

Yes, sorry. This is exactly what is done. The parameter is necessary because just making it the default would change the behaviour for users. Also, very often values would converge only because of "unexpected" additional marginal computations.

This is something that happens often: we do AbsoluteStandardError(0.02) | MaxUpdates(1000). Then some values might fulfill the first criterion really fast, despite being very bad estimates. Because the other indices take much longer to have low stderr and done(result) is a global check, the "converged" ones keep being updated and end up being good estimates. If we stopped updating them, the values would be off by a lot. The way to fix this is by making the stderr threshold much lower, e.g. AbsoluteStandardError(1e-3) | MaxUpdates(1000). With skip_converged=True this more stringent check still takes less time than the first one.

@mdbenito mdbenito self-assigned this Sep 22, 2023
@mdbenito mdbenito added this to the v0.7.2 milestone Sep 22, 2023
@mdbenito mdbenito marked this pull request as ready for review September 22, 2023 21:24
src/pydvl/value/semivalues.py Show resolved Hide resolved
src/pydvl/value/semivalues.py Show resolved Hide resolved
src/pydvl/value/stopping.py Outdated Show resolved Hide resolved
tests/value/test_semivalues.py Show resolved Hide resolved
Copy link
Collaborator Author

@mdbenito mdbenito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kosmitive for the comments. Implementing a couple

src/pydvl/value/stopping.py Outdated Show resolved Hide resolved
src/pydvl/value/semivalues.py Show resolved Hide resolved
tests/value/test_semivalues.py Show resolved Hide resolved
@mdbenito
Copy link
Collaborator Author

This is something that happens often: we do AbsoluteStandardError(0.02) | MaxUpdates(1000). Then some values might fulfill the first criterion really fast, despite being very bad estimates. Because the other indices take much longer to have low stderr and done(result) is a global check, the "converged" ones keep being updated and end up being good estimates. If we stopped updating them, the values would be off by a lot. The way to fix this is by making the stderr threshold much lower, e.g. AbsoluteStandardError(1e-3) | MaxUpdates(1000). With skip_converged=True this more stringent check still takes less time than the first one.

Added something along these lines to the docs for stopping.py

@mdbenito mdbenito merged commit 4c10cc3 into develop Oct 8, 2023
12 checks passed
@mdbenito mdbenito deleted the feature/filter-converged branch October 8, 2023 13:26
@mdbenito mdbenito modified the milestones: v0.7.2, v0.7.1 Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow stopping criteria that operate on indices
2 participants