You can find more details about DT-Sampler at https://arxiv.org/abs/2307.13333.
DT-sampler is an ensemble model based on decision tree sampling. Different from random forest, DT-sampler uniformly samples decision trees from a given space, which can generate more stable results and provide higher interpretability compared to random forest. DT-sampler only has two key parameters: #node and threshold. #node constrains the size of decision trees generated by DT-sampler and threshold ensures a minimum training accuracy for each decision tree.
① Encode the construction of decision trees as a SAT problem.
② Utilize SAT sampler to uniformly sample multiple satisfiable solutions from the high accuracy space.
③ Decode the satisfiable solutions back into decision trees.
④ Estimate the training accuracy distribution of the decision trees in the high accuracy space.
⑤ Measure feature importance by calculating the emergence probability of each feature.
matplotlib == 3.6.3
numpy == 1.21.0
pandas == 1.5.3
pyunigen == 2.5.2
scikit_learn == 1.2.1
scipy == 1.11.1
z3_solver == 4.12.1.0
...
dt_sampler = DT_sampler(X_train, y_train, #node, threshod, "./cnf/cnf_name.cnf")
dt_sampler.run(#tree, method = "unigen", seed)
...
Chao Huang ([email protected])
Department of Computational Biology and Medical Science
The University of Tokyo