Cătălin E. Briţa, Jacobus G. M. van der Linden (e-mail), Emir Demirović - Delft University of Technology
ConTree computes optimal binary classification trees on datasets with continuous features using dynamic programming with branch-and-bound.
If you use ConTree, please cite our paper:
- Briţa, Cătălin E., Jacobus G. M. van der Linden, and Emir Demirović. "Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound." In Proceedings of AAAI-25 (2025).
The pycontree
python package can be installed from source as follows:
git clone https://github.com/ConSol-Lab/contree.git
cd contree
pip install .
pycontree
can be used, for example, as follows:
from pycontree import ConTree
import pandas as pd
from sklearn.metrics import accuracy_score
df = pd.read_csv("datasets/bank.txt", sep=" ", header=None)
X = df[df.columns[1:]]
y = df[0]
contree = ConTree(max_depth=3)
contree.fit(X, y)
ypred = contree.predict(X)
print("Accuracy: " , accuracy_score(y, ypred))
See the examples folder for a number of example usages.
Note that some of the examples require the installation of extra python packages:
pip install matplotlib seaborn graphviz
Graphviz additionaly requires another instalation of a binary. See their website.
The code can be compiled on Windows or Linux by using cmake. For Windows users, cmake support can be installed as an extension of Visual Studio and then this repository can be imported as a CMake project.
For Linux users, they can use the following commands:
cd code
mkdir build
cd build
cmake ..
cmake --build .
The compiler must support the C++17 standard
After ConTree is built, the following command can be used (for example):
./ConTree -file ../datasets/bank.txt -max-depth 3
Run the program without any parameters to see a full list of the available parameters.
ConTree can be configured by the following parameters:
max_depth
: The maximum depth of the tree. Note that a tree of depth zero has a single leaf node. A tree of depth one has one branching node and two leaf nodes.max_gap
: The maximum permissible gap to the optimal solution.max_gap_decay
: Use this parameter, if you want to find solutions iteratively, with each iteration decreasing themax_gap
by multiplying it withmax_gap_decay
.time_limit
: The run time limit in seconds. If the time limit is exceeded a possibly non-optimal tree is returned.sort_gini
: If true, the features are sorted by gini impurity.use_upper_bound
: Enables or disables the use of upper bounds.verbose
: Enable or disable verbose output.
ConTree assumes classification labels are in the range 0 ... n_labels - 1
. Not meeting this assumption may influence the algorithm's performance. Use sklearn's LabelEncoder
to prevent this.
This work is follow up on our previous research:
- Demirović, Emir, et al. "Murtree: Optimal decision trees via dynamic programming and search." Journal of Machine Learning Research 23.26 (2022): 1-47. pdf / source
- Van der Linden, Jacobus G. M., Mathijs M. de Weerdt, and Emir Demirović. "Necessary and Sufficient Conditions for Optimal Decision Trees using Dynamic Programming." In Advances in Neural Information Processing Systems (2023). pdf / source
Other related work:
- Hu, Xiyang, Cynthia Rudin, and Margo Seltzer. "Optimal sparse decision trees." In Advances in Neural Information Processing Systems (2019). pdf / source
- Lin, Jimmy, et al. "Generalized and scalable optimal sparse decision trees." In International Conference on Machine Learning (2020). pdf / source
- Aglin, Gaël, Siegfried Nijssen, and Pierre Schaus. "Learning optimal decision trees using caching branch-and-bound search." In Proceedings of the AAAI conference on artificial intelligence (2020). pdf / source
- Mazumder, Rahul, Xiang Meng, and Haoyue Wang. "Quant-BnB: A scalable branch-and-bound method for optimal decision trees with continuous features." In International Conference on Machine Learning (2022). pdf / source