Releases: EducationalTestingService/skll
SKLL 1.5.3
This is a minor release of SKLL with the most notable change being compatibility with the latest version of scikit-learn (v0.20.1).
What's new
- SKLL is now compatible with scikit-learn v0.20.1 (Issue #432, PR #439).
GradientBoostingClassifier
andGradientBoostingRegressor
now accept sparse matrices as input (Issue #428, PR #429).- The
model_params
property now works for SVC learners with a linear kernel (Issue #425, PR #443). - Improved documentation (Issue #423, PR #437).
- Update
generate_predictions
to output the probabilities for all classes instead of just the first class (Issue #430, PR #433). Note: this change breaks backward compatibility with previous SKLL versions since the output file now always includes a column header.
Bugfixes
- Fixed broken links in documentation (Issues #421 and #422, PR #437).
- Fixed data type conversion in
NDJWriter
(Issue #416, PR #440). - Properly handle the possible combinations of trained model and prediction set vectorizers in
Learner.predict
(Issue #414, PR #445).
Other changes
SKLL 1.5.2
This is a hot fix release that addresses a single issue.
Learner
instances created via from_file()
method did not get loggers associated with them. This meant that any and all warnings generated for such learner instances would have led to AttributeError
exceptions.
SKLL 1.5.1
This is primarily a bug fix release.
Bugfixes
- Generate the "folds_file" warnings only when "folds_file" is specified (issue #404, PR #405).
- Modify
Learner.save()
to deal properly with reading in and re-saving older models (issue #406, PR #407). - Fix regression that caused the output directories to not be automatically created (issue #408, PR #409).
SKLL 1.5
This is a major new release of SKLL.
What's new
- Several new scikit-learn learners included along with reasonable default parameter grids for tuning, where appropriate (issues #256 & #375, PR #377).
BayesianRidge
DummyRegressor
HuberRegressors
Lars
MLPRegressor
RANSACRegressor
TheilSenRegressor
DummyClassifier
MLPClassifier
RidgeClassifier
- Allow computing any number of additional evaluation metrics in addition to the tuning objective (issue #350, PR #384).
- Rename
cv_folds_file
configuration option tofolds_file
. The former is still supported with a deprecation warning but will be removed in the next release (PR #367). - Add a new configuration option
use_folds_file_for_grid_search
which controls whether the inner-loop grid-search in a cross-validation experiment with a custom folds file also uses the folds from the file. It's set to True by default. Setting it to False means that the inner loop uses regular 3-fold cross-validation and ignores the file (PR #367). - Also add a keyword argument called
use_custom_folds_for_grid_search
to theLearner.cross_validate()
method (PR #367). - Learning curves can now be plotted from existing summary files using the new
plot_learning_curves
command line utility (issue #346, PR #396). - Overhaul logging in SKLL. All messages are now logged both to the console (if running interactively) and to log files. Read more about the SKLL log files in the Output Files section of the documentation (issue #369, PR #380).
neg_log_loss
is now available as an objective function for classification (issue #327, PR #392).
Changes
- SKLL now supports Python 3.6. Although Python 3.4 and 3.5 will still work, 3.6 is now the officially supported Python 3 version. Python 2.7 is still supported. (issue #355, PR #360).
- The required version of scikit-learn has been bumped up to 0.19.1 (issue #328, PR #330).
- The learning curve y-limits are now computed a bit more intelligently (issue #389, PR #390).
- Raise a warning if ablation flag is used for an experiment that uses
train_file
/test_file
- this is not supported (issue #313, PR #392). - Raise a warning if both
fixed_parameters
andparam_grids
are specified (issue #185, PR #297). - Disable grid search if no default parameter grids are available in SKLL and the user doesn't provide parameter grids either (issue #376, PR #378).
- SKLL has a copy of scikit-learn's
DictVectorizer
because it needs some custom functionality. Most (but not all) of our modifications have now been merged into scikit-learn so our custom version is now significantly condensed down to just a single method (issue #263, PR #374). - Improved outputs for cross-validation tasks (issues #349 & #371, PRs #365 & #372)
- When a folds file is specified, the log erroneously showed the full dictionary.
- Show number of cross-validation folds in results to be via folds file if a folds file is specified.
- Show grid search folds in results to be via folds file if the grid search ends up using the folds file.
- Do not show the stratified folds information in results when a folds file is specified.
- Show the value of
use_folds_file_for_grid_search
in results when appropriate. - Show grid search related information in results only when we are actually doing grid search.
- The Travis CI plan was broken up into multiple jobs in order to get around the 50 minute limit (issue #385, PR #387).
- For the conda package, some of the dependencies are now sourced from the
conda-forge
channel.
Bugfixes
- Fix the bug that was causing the inner grid-search loop of a cross-validation experiment to use a single job instead of the number specified via
grid_search_jobs
(issue #363, PR #367). - Fix unbound variable in
readers.py
(issue #340, PR #392). - Fix bug when running a learning curve experiment via
gridmap
(issue #386, PR #390). - Fix a mismatch between the default number of grid search folds and the default number of slots requested via
gridmap
(issue #342, PR #367).
Documentation
SKLL 1.3
This is a major new release of SKLL.
New features
- You can now generate learning curves for multiple learners, multiple feature sets, and multiple objectives in a single experiment by using
task=learning_curve
in the configuration file. See documentation for more details (issue #221, PR #332).
Changes
- The required version of scikit-learn has been bumped up to 0.18.1 (issue #328, PR #330).
- SKLL now uses the MKL backend on macOS/Linux instead of OpenBLAS when used as a
conda
package.
Bugfixes
- Fix deprecation warning when using
Learner.model_params()
(issue #325, PR #329). - Update the definitions of SKLL F1 metrics as a result of scikit-learn upgrade (issue #325, PR #330).
- Bring documentation for SVC parameter grids up to date with the code (issue #334, PR #337).
- Update documentation to make it clear that the SKLL
conda
package is only available for Python 3.4. For other Python versions, users should usepip
.
SKLL 1.2.1
SKLL 1.2
This release includes major changes as well as a number of bugfixes.
Changes:
- The required version of scikit-learn has been bumped up to 0.17.1 (issue #273, PRs #288 and #308)
- You can now optionally save cross-validation folds to a file for later analysis (issue #259, PR #262)
- Update documentation to be clear about when two
FeatureSet
instances are deemed equal (issue #272, PR #294) - You can now specify multiple objective functions for parameter tuning (issue #115, PR #291)
Bugfixes:
- Use a fixed random state when doing non-stratified k-fold cross-validation (issue #247, PR #286)
- Fix errors when using reusing relative paths in output section (issue #252, PR #287)
print_model_weights
now works correctly for multi-class logistic regression models (issue #274, PR #267)- Correctly raise an
IOError
if the config file is not correctly specified (issue #275, PR #281) - The
evaluate
task does not crash when the test data has labels that were not seen in training data (issue #279, PR #290) - The
fit()
method for rescaled versions of learners now works correctly when not doing grid search (issue #304, PR #306) - Fix minor typos in the documentation and tutorial.
SKLL 1.1.1
This is a minor bugfix release. It fixes:
- Issue where a
FileExistsError
would be raised when processing many configs (PR #260) - Instance of
cv_folds
instead ofnum_cv_folds
in the documentation (PR #248). - Crash with
print_model_weights
and Logistic Regression models without intercepts (issue #250, PR #251) - Division by zero error when there was only one example (issue #253, PR #254)
SKLL 1.1.0
The biggest changes in this release are that the required version of scikit-learn has been bumped up to 0.16.1 and config file parsing is much more robust and gives much better error messages when users make mistakes.
Implemented enhancements
- Base estimators other than the defaults are now supported for
AdaBoost
classifiers and regressors (#238) - User can now specify number of cross-validation folds to use in the config file (#222)
- Decision Trees and Random Forests no longer need dense inputs (#207)
- Stratification during cross-validation is now optional (#160)
Fixed bugs
- Bug when checking if
hasher_features
is a valid option (#234) - Invalid/missing/duplicate options in configuration are now detected (#223)
- Stop modifying global numpy random seed (#220)
- Relative paths specified in the config file are now relative to the config file location instead of to the current directory (#213)
Closed issues
- Incompatibility with the latest version of scikit-learn (v0.16.1) (#235, #241, #233)
- Learner.model_params will return weights with the wrong sign if sklearn is fixed (#111)
Merged pull requests
- Overhaul configuration file parsing (@desilinguist, #246)
- Several minor bugfixes (@desilinguist, #245)
- Compatibility with scikit-learn v0.16.1 (@desilinguist, #243)
- Expose cv_folds and stratified (@aoifecahill, #240)
- Adding Report tests (@brianray, #237)
SKLL 1.0.1
This is a fairly minor bugfix release. Changes include:
- Update links in README.
- Fix crash when trying to run experiments with integer labels (Issue #225, PR #219)
- Update documentation about ablation to note that there will always be a run with all features (Issue #224, PR #226)
- Update documentation about format of
cv_folds_file
(Issue #225, PR #228) - Remove duplicate words in documentation (PR #218)
- Fixed
KeyError
when trying to build conda recipe. - Update outdated parameter grids in
run_experiment
documentation (commit 80d78e4)