- Cleanup: removed unnecessary decorator
@unpackable
PR #233 - Stopping criteria: fixed problem with
StandardError
and enable proper composition of index convergence statuses. Fixed a bug withn_jobs
intruncated_montecarlo_shapley
. PR #300 - Shuffling code around to allow for simpler user imports, some cleanup and documentation fixes. PR #284
- Bug fix: Warn instead of raising an error when
n_iterations
is less than the size of the dataset in Monte Carlo Least Core PR #281
- Fixed parallel and antithetic Owen sampling for Shapley values. Simplified and extended tests. PR #267
- Added
Scorer
class for a cleaner interface. Fixed minor bugs around Group-Testing Shapley, added more tests and switched to cvxpy for the solver. PR #264 - Generalised stopping criteria for valuation algorithms. Improved classes
ValuationResult
andStatus
with more operations. Some minor issues fixed. PR #252 - Fixed a bug whereby
compute_shapley_values
would only spawn one process when usingn_jobs=-1
and Monte Carlo methods. PR #270 - Bugfix in
RayParallelBackend
: wrong semantics forkwargs
. PR #268 - Splitting of problem preparation and solution in Least-Core computation. Umbrella function for LC methods. PR #257
- Operations on
ValuationResult
andStatus
and some cleanup PR #248 - Bug fix and minor improvements: Fixes bug in TMCS with remote Ray cluster,
raises an error for dummy sequential parallel backend with TMCS, clones model
inside
Utility
before fitting by default, with flagclone_before_fit
to disable it, catches all warnings inUtility
whenshow_warnings
isFalse
. Adds Miner and Gloves toy games utilities PR #247
- GH action to mark issues as stale PR #201
- Disabled caching of Utility values as well as repeated evaluations by default PR #211
- Test and officially support Python version 3.9 and 3.10 PR #208
- Breaking change: Introduces a class ValuationResult to gather and inspect results from all valuation algorithms PR #214
- Fixes bug in Influence calculation with multidimensional input and adds new example notebook PR #195
- Breaking change: Passes the input to
MapReduceJob
at initialization, removeschunkify_inputs
argument fromMapReduceJob
, removesn_runs
argument fromMapReduceJob
, calls the parallel backend'sput()
method for each generated chunk in_chunkify()
, renames ParallelConfig'snum_workers
attribute ton_local_workers
, fixes a bug inMapReduceJob
's chunkification whenn_runs
>=n_jobs
, and defines a sequential parallel backend to run all jobs in the current thread PR #232 - New method: Implements exact and monte carlo Least Core for data valuation,
adds
from_arrays()
class method to theDataset
andGroupedDataset
classes, addsextra_values
argument toValuationResult
, addscompute_removal_score()
andcompute_random_removal_score()
helper functions PR #237 - New method: Group Testing Shapley for valuation, from Jia et al. 2019 PR #240
- Fixes bug in ray initialization in
RayParallelBackend
class PR #239 - Implements "Egalitarian Least Core", adds cvxpy as a dependency and uses it instead of scipy as optimizer PR #243
- Simplified and fixed powerset sampling and testing PR #181
- Simplified and fixed publishing to PyPI from CI PR #183
- Fixed bug in release script and updated contributing docs. PR #184
- Added Pull Request template PR #185
- Modified Pull Request template to automatically link PR to issue PR ##186
- First implementation of Owen Sampling, squashed scores, better testing PR #194
- Improved documentation on caching, Shapley, caveats of values, bibtex PR #194
- Breaking change: Rearranging of modules to accommodate for new methods PR #194
Mostly API documentation and notebooks, plus some bugfixes.
In PR #161:
- Support for $$ math in sphinx docs.
- Usage of sphinx extension for external links (introducing new directives like
:gh:
,:issue:
and:tfl:
to construct standardised links to external resources). - Only update auto-generated documentation files if there are changes. Some
minor additions to
update_docs.py
. - Parallelization of exact combinatorial Shapley.
- Integrated KNN shapley into the main interface
compute_shapley_values
.
In PR #161:
- Improved main docs and Shapley notebooks. Added or fixed many docstrings, readme and documentation for contributors. Typos, grammar and style in code, documentation and notebooks.
- Internal renaming and rearranging in the parallelization and caching modules.
- Bug in random matrix generation PR #161.
- Bugs in MapReduceJob's
_chunkify
and_backpressure
methods PR #176.
This is very first release of pyDVL.
It contains:
-
Data Valuation Methods:
- Leave-One-Out
- Influence Functions
- Shapley:
- Exact Permutation and Combinatorial
- Montecarlo Permutation and Combinatorial
- Truncated Montecarlo Permutation
-
Caching of results with Memcached
-
Parallelization of computations with Ray
-
Documentation
-
Notebooks containing examples of different use cases