aGTBoost

Adaptive and automatic gradient tree boosting computations

aGTBoost is a lightning fast gradient boosting library designed to avoid manual tuning and cross-validation by utilizing an information theoretic approach. This makes the algorithm adaptive to the dataset at hand; it is completely automatic, and with minimal worries of overfitting. Consequently, the speed-ups relative to state-of-the-art implementations are in the thousands while mathematical and technical knowledge required on the user are minimized.

Note: Currently for academic purposes: Implementing and testing new innovations w.r.t. information theoretic choices of GTB-complexity. See below for to-do research list.

Installation

R: Finally on CRAN! Install the stable version with

install.packages("agtboost")

or install the development version from GitHub

devtools::install_github("Blunde1/agtboost/R-package")

Users experiencing errors after warnings during installlation, may be helped by the following command prior to installation:

Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true")

Example code and documentation

agtboost essentially has two functions, a train function gbt.train and a predict function predict. From the code below it should be clear how to train an aGTBoost model using a design matrix x and a response vector y, write ?gbt.train in the console for detailed documentation.

library(agtboost)

# -- Load data --
data(caravan.train, package = "agtboost")
data(caravan.test, package = "agtboost")
train <- caravan.train
test <- caravan.test

# -- Model building --
mod <- gbt.train(train$y, train$x, loss_function = "logloss", verbose=10)

# -- Predictions --
prob <- predict(mod, test$x) # Score after logistic transformation: Probabilities

agtboostalso contain functions for model inspection and validation.

Feature importance: gbt.importance generates a typical feature importance plot. Techniques like inserting noise-features are redundant due to computations w.r.t. approximate generalization (test) loss.
Convergence: gbt.convergence computes the loss over the path of boosting iterations. Check visually for convergence on test loss.
Model validation: gbt.ksval transforms observations to standard uniformly distributed random variables, if the model is specified correctly. Perform a formal Kolmogorov-Smirnov test and plots transformed observations for visual inspection.

# -- Feature importance --
gbt.importance(feature_names=colnames(caravan.train$x), object=mod)

# -- Model validation --
gbt.ksval(object=mod, y=caravan.test$y, x=caravan.test$x)

The functions gbt.ksval and gbt.importance create the following plots:

Furthermore, an aGTBoost model is (see example code)

highly robust to dimensions: Comparisons to (penalized) linear regression in (very) high dimensions
has minimal worries of overfitting: Stock market classificatin
and can train further given previous models: Boosting from a regularized linear model

Dependencies

My research
Eigen Linear algebra
Rcpp for the R-package

Scheduled updates

Adaptive and automatic deterministic frequentist gradient tree boosting.
Information criterion for fast histogram algorithm (non-exact search) (Fall 2020, planned)
Adaptive L2-penalized gradient tree boosting. (Fall 2020, planned)
Automatic stochastic gradient tree boosting. (Fall 2020/Spring 2021, planned)

Hopeful updates

Optimal stochastic gradient tree boosting.

References

Contribute

Any help on the following subjects are especially welcome:

Utilizing sparsity (possibly Eigen sparsity).
Paralellizatin (CPU and/or GPU).
Distribution (Python, Java, Scala, ...),
good ideas and coding best-practices in general.

Please note that the priority is to work on and push the above mentioned scheduled updates. Patience is a virtue. :)

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
R-package		R-package
article-supplementary-material		article-supplementary-material
docs		docs
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aGTBoost

Installation

Example code and documentation

Dependencies

Scheduled updates

Hopeful updates

References

Contribute

About

Releases 4

Packages

Contributors 3

Languages

License

Blunde1/agtboost

Folders and files

Latest commit

History

Repository files navigation

aGTBoost

Installation

Example code and documentation

Dependencies

Scheduled updates

Hopeful updates

References

Contribute

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 3

Languages

Packages