Releases: HLT-ISTI/QuaPy
QuaPy v0.1.9
QuaPy v0.1.9 released!
Major changes can be consuled here:
-
Added LeQua 2024 datasets and normalized match distance to qp.error
-
Improved data loaders for UCI binary and UCI multiclass datasets (thanks to Lorenzo Volpi!); these datasets
can be loaded with standardised covariates (default) -
Added a default classifier for aggregative quantifiers, which now can be instantiated without specifying
the classifier. The default classifier can be accessed in qp.environ['DEFAULT_CLS'] and is assigned to
sklearn.linear_model.LogisticRegression(max_iter=3000). If the classifier is not specified, then a clone
of said classifier is returned. E.g.:pacc = PACC()
is equivalent to:
pacc = PACC(classifier=LogisticRegression(max_iter=3000)) -
Improved error loging in model selection. In v0.1.8 only Status.INVALID was reported; in v0.1.9 it is
now accompanied by a textual description of the error -
The number of parallel workers can now be set via an environment variable by running, e.g.:
N_JOBS=10 python3 your_script.py
which has the same effect as writing the following code at the beginning of your_script.py:
import quapy as qp
qp.environ["N_JOBS"] = 10 -
Some examples have been added to the ./examples/ dir, which now contains numbered examples from basics (0)
to advanced topics (higher numbers) -
Moved the wiki documents to the ./docs/ folder so that they become editable via PR for the community
-
Added Composable methods from Mirko Bunse's qunfold library! (thanks to Mirko Bunse!)
-
Added Continuous Integration with GitHub Actions (thanks to Mirko Bunse!)
-
Added Bayesian CC method (thanks to Pawel Czyz!). The method is described in detail in the paper
Ziegler, Albert, and Paweł Czyż. "Bayesian Quantification with Black-Box Estimators."
arXiv preprint arXiv:2302.09159 (2023). -
Removed binary UCI datasets {acute.a, acute.b, balance.2} from the list qp.data.datasets.UCI_BINARY_DATASETS
(the datasets are still loadable from the fetch_UCIBinaryLabelledCollection and fetch_UCIBinaryDataset
functions, though). The reason is that these datasets tend to yield results (for all methods) that are
one or two orders of magnitude greater than for other datasets, and this has a disproportionate impact in
methods average (I suspect there is something wrong in those datasets).
QuaPy v0.1.8
-
Added Kernel Density Estimation methods (KDEyML, KDEyCS, KDEyHD) as proposed in the paper:
Moreo, A., González, P., & del Coz, J. J. Kernel Density Estimation for Multiclass Quantification.
arXiv preprint arXiv:2401.00490, 2024 -
Substantial internal refactor: aggregative methods now inherit a pattern by which the fit method consists of:
- fitting the classifier and returning the representations of the training instances (typically the posterior
probabilities, the label predictions, or the classifier scores, and typically obtained through kFCV). - fitting an aggregation function
The function implemented in step a) is inherited from the super class. Each new aggregative method now has to
implement only the "aggregative_fit" of step b).
This pattern was already implemented for the prediction (thus allowing evaluation functions to be performed
very quicky), and is now available also for training. The main benefit is that model selection now can nestle
the training of quantifiers in two levels: one for the classifier, and another for the aggregation function.
As a result, a method with a param grid of 10 combinations for the classifier and 10 combinations for the
quantifier, now implies 10 trainings of the classifier + 1010 trainings of the aggregation function (this is
typically much faster than the classifier training), whereas in versions <0.1.8 this amounted to training
1010 (classifiers+aggregations). - fitting the classifier and returning the representations of the training instances (typically the posterior
-
Added different solvers for ACC and PACC quantifiers. In quapy < 0.1.8 these quantifiers try to solve the system
of equations Ax=B exactly (by means of np.linalg.solve). As noted by Mirko Bunse (thanks!), such an exact solution
does sometimes not exist. In cases like this, quapy < 0.1.8 resorted to CC for providing a plausible solution.
ACC and PACC now resorts to an approximated solution in such cases (minimizing the L2-norm of the difference
between Ax-B) as proposed by Mirko Bunse. A quick experiment reveals this heuristic greatly improves the results
of ACC and PACC in T2A@LeQua. -
Fixed ThresholdOptimization methods (X, T50, MAX, MS and MS2). Thanks to Tobias Schumacher and colleagues for pointing
this out in Appendix A of "Schumacher, T., Strohmaier, M., & Lemmerich, F. (2021). A comparative evaluation of
quantification methods. arXiv:2103.03223v3 [cs.LG]" -
Added HDx and DistributionMatchingX to non-aggregative quantifiers (see also the new example "comparing_HDy_HDx.py")
-
New UCI multiclass datasets added (thanks to Pablo González). The 5 UCI multiclass datasets are those corresponding
to the following criteria:
- >1000 instances
- >2 classes
- classification datasets
- Python API available -
New IFCB (plankton) dataset added (thanks to Pablo González). See qp.datasets.fetch_IFCB.
-
Added new evaluation measures NAE, NRAE (thanks to Andrea Esuli)
-
Added new meta method "MedianEstimator"; an ensemble of binary base quantifiers that receives as input a dictionary
of hyperparameters that will explore exhaustively, fitting and generating predictions for each combination of
hyperparameters, and that returns, as the prevalence estimates, the median across all predictions. -
Added "custom_protocol.py" example.
-
New API documentation template.
QuaPy v0.1.7
New release of QuaPy. Major changes include the abstraction of protocols as callable functions that generate samples for evaluation, the implementation of new methods including DistributionMatching, the addition of an example folder, and several optimizations. See the CHANGE_LOG.txt file for further details.
Quapy 0.1.6
Update README.md