-
Notifications
You must be signed in to change notification settings - Fork 58
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refined examples and hyperparameters of FastRGF (#203)
* renamed examples folders * refined FastRGF examples * enhenced FastRGF parameters tuning guide and fixed typo in param name * fixed execute script command * made examples cross-platform
- Loading branch information
1 parent
b3a6e6e
commit 676f6b7
Showing
19 changed files
with
138 additions
and
103 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,9 @@ | ||
### Examples | ||
--- | ||
* ex1 This is a binary classification problem, in libsvm's sparse feature format. | ||
Use the *shell script* [run.sh](ex1/run.sh) to perform training/test. | ||
The dataset is downloaded from <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#madelon>. | ||
|
||
|
||
* ex2: This is a regression problem, in dense feature format. Use the *shell script* [run.sh](ex2/run.sh) to perform training/test. | ||
The dataset is from <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#housing>. | ||
|
||
# Examples | ||
|
||
Note that for these small examples, the running time with multi-threads may be slower than with single-thread due to the overhead it introduces. However, for large datasets, one can observe an almost linear speed up. | ||
You can learn how to use FastRGF by these examples. | ||
|
||
The program can directly handle high dimensional sparse features in the libsvm format as in ex1. This is the recommended format to use when the dataset is relatively large (although some other formats are supported). | ||
|
||
--- | ||
### Tips for Parameter Tuning | ||
|
||
There are multiple training parameters that can affect performance. The following are the more important ones: | ||
|
||
* **dtree.loss**: default is LS, but for binary classificaiton, LOGISTIC often works better. | ||
* **forest.ntrees**: typical range is [100,10000], and a typical value is 1000. | ||
* **dtree.lamL2**: use a relatively large vale such as 1000 or 10000. The larger dtree.lamL2 is, the larger forest.ntrees you need to use: the resulting accuracy is often better with a longer training time. | ||
* **dtree.lamL1**: try values in [0,1000], and a large value induces sparsity. | ||
* **dtree.max_level** and **dtree.max_nodes** and **dtree.new_tree_gain_ratio**: these parameters control the tree depth and size (and when to start a new tree). One can try different values (such as dtree.max_level=4, or dtree.max_nodes=10, or dtree.new_tree_gain_ratio=0.5) to fine tuning performance. | ||
|
||
You may also modify the discreitzation options below: | ||
|
||
* **discretize.dense.max_buckets**: try in the range of [10,65000] | ||
* **discretize.sparse.max_buckets**: try in the range of [10, 250]. If you want to try a larger value up to 65000, then you need to edit [../include/header.h](../include/header.h) and replace | ||
"*using disc_sparse_value_t=unsigned char;*" | ||
by "*using disc_sparse_value_t=unsigned short;*". However, this increase the memory useage. | ||
* **discretize.sparse.max_features**: you may try a different value in [1000,10000000]. | ||
Note that for these small examples, the running time with multithreading may be slower than with single-threading due to the overhead it introduces. | ||
However, for large datasets, one can observe an almost linear speedup. | ||
|
||
FastRGF can directly handle high-dimensional sparse features in the libsvm format as in [binary_classification example](./binary_classification). | ||
This is the recommended format to use when the dataset is relatively large (although some other formats are supported). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Binary Classification Example | ||
|
||
Here is an example for FastRGF to run binary classification task. | ||
Dataset for this example is taken from [here](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#madelon) and features are written in libsvm sparse format. | ||
|
||
You should make sure that executable files are placed into `../../bin` folder. | ||
|
||
Execute the shell script in this folder to run the example: | ||
|
||
for Windows: | ||
|
||
``` | ||
run.sh | ||
``` | ||
|
||
for Unix-like systems: | ||
|
||
``` | ||
bash run.sh | ||
``` |
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Regression Example | ||
|
||
Here is an example for FastRGF to run regression task. | ||
Dataset for this example is taken from [here](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#housing) and features are written in the dense format. | ||
|
||
You should make sure that executable files are placed into `../../bin` folder. | ||
|
||
Execute the shell script in this folder to run the example: | ||
|
||
for Windows: | ||
|
||
``` | ||
run.sh | ||
``` | ||
|
||
for Unix-like systems: | ||
|
||
``` | ||
bash run.sh | ||
``` |
3 changes: 1 addition & 2 deletions
3
FastRGF/examples/ex2/inputs/config → FastRGF/examples/regression/inputs/config
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,10 @@ | ||
# discretization options | ||
discretize.dense.max_buckets=250 | ||
discretize.dense.lamL2=10 | ||
|
||
# training options | ||
dtree.new_tree_gain_ratio=1.0 | ||
dtree.loss=LS | ||
dtree.lamL1=10 | ||
dtree.lamL2=1000 | ||
forest.ntrees=1000 | ||
|
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
#!/bin/sh -f | ||
|
||
exe_train=../../bin/forest_train | ||
exe_predict=../../bin/forest_predict | ||
|
||
trn=inputs/housing.train | ||
tst=inputs/housing.test | ||
feat_name=inputs/feature.names | ||
|
||
config=inputs/config | ||
|
||
model_rgf=outputs/model-rgf | ||
|
||
prediction=outputs/prediction | ||
|
||
orig_format="y.x" | ||
|
||
echo ------ training ------ | ||
time ${exe_train} -config=${config} trn.x-file=${trn} trn.x-file_format=${orig_format} trn.target=REAL tst.x-file=${tst} tst.x-file_format=${orig_format} tst.target=REAL model.save=${model_rgf} | ||
echo " " | ||
|
||
echo ------ printing forest ------ | ||
${exe_predict} model.load=${model_rgf} tst.print-forest=${model_rgf}.print tst.feature-names=${feat_name} | ||
echo " " | ||
|
||
echo ------ testing ------ | ||
echo === ${trn} === | ||
time ${exe_predict} tst.x-file=${trn} tst.x-file_format=${orig_format} tst.target=REAL model.load=${model_rgf} tst.output-prediction=${prediction}-train | ||
echo " " | ||
echo === ${tst} === | ||
time ${exe_predict} tst.x-file=${tst} tst.x-file_format=${orig_format} tst.target=REAL model.load=${model_rgf} tst.output-prediction=${prediction}-test |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters