Skip to content

Commit

Permalink
Update default epochs to 1000 and add note about optimal stopping cri… (
Browse files Browse the repository at this point in the history
#85)

* Update default epochs to 1000 and add note about optimal stopping criterion

Signed-off-by: Aivin V. Solatorio <[email protected]>

* Bump version to 0.2.1

Signed-off-by: Aivin V. Solatorio <[email protected]>

---------

Signed-off-by: Aivin V. Solatorio <[email protected]>
  • Loading branch information
avsolatorio authored Oct 18, 2024
1 parent 64e1b04 commit 650c6ca
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 4 deletions.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@ pip install realtabformer
We show examples of using the REaLTabFormer for modeling and generating synthetic data from a trained model.
> [!INFO]
> The model implements an optimal stopping criterion based on the synthetic data distribution when training a non-relational tabular model.
> The model will stop training when the synthetic data distribution is close to the real data distribution.
>
> **Make sure to set the `epochs` parameter to a large number to allow the model to fit the data better.**
> The model will stop training when the optimal stopping criterion is met.
### REaLTabFormer for regular tabular data
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description = "A novel method for generating tabular and relational data using l
authors = ["Aivin V. Solatorio <[email protected]>"]
readme = "README.md"
license = "MIT"
version = "0.2.0"
version = "0.2.1"
homepage = "https://github.com/avsolatorio/REaLTabFormer"
documentation = "https://worldbank.github.io/REaLTabFormer/"

Expand Down
2 changes: 1 addition & 1 deletion src/realtabformer/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.2.0
0.2.1
4 changes: 2 additions & 2 deletions src/realtabformer/realtabformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def __init__(
freeze_parent_model: Optional[bool] = True,
checkpoints_dir: str = "rtf_checkpoints",
samples_save_dir: str = "rtf_samples",
epochs: int = 100,
epochs: int = 1000,
batch_size: int = 8,
random_state: int = 1029,
train_size: float = 1,
Expand Down Expand Up @@ -121,7 +121,7 @@ def __init__(
frozen or not.
checkpoints_dir: Directory where the training checkpoints will be saved
samples_save_dir: Save the samples generated by this model in this directory.
epochs: Number of epochs for training the GPT2LM model
epochs: Number of epochs for training the GPT2LM model. Use a large number of epochs to take advantage of the framework's optimal termination feature for the non-relational tabular data model. Defaults to 1000.
batch_size: Batch size used for training. Must be adjusted based on the available
compute resource. TrainingArguments is set to use `gradient_accumulation_steps=4`
which will have an effective batch_size of 32 for the default value.
Expand Down

0 comments on commit 650c6ca

Please sign in to comment.