Update default epochs to 1000 and add note about optimal stopping cri… (

#85) * Update default epochs to 1000 and add note about optimal stopping criterion Signed-off-by: Aivin V. Solatorio <[email protected]> * Bump version to 0.2.1 Signed-off-by: Aivin V. Solatorio <[email protected]> --------- Signed-off-by: Aivin V. Solatorio <[email protected]>
worldbank · Oct 18, 2024 · 650c6ca · 650c6ca
1 parent 64e1b04
commit 650c6ca
Show file tree

Hide file tree

Showing 4 changed files with 11 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -30,6 +30,13 @@ pip install realtabformer
 
 We show examples of using the REaLTabFormer for modeling and generating synthetic data from a trained model.
 
+> [!INFO]
+> The model implements an optimal stopping criterion based on the synthetic data distribution when training a non-relational tabular model.
+> The model will stop training when the synthetic data distribution is close to the real data distribution.
+>
+> **Make sure to set the `epochs` parameter to a large number to allow the model to fit the data better.**
+> The model will stop training when the optimal stopping criterion is met.
+
 ### REaLTabFormer for regular tabular data
 
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -8,7 +8,7 @@ description = "A novel method for generating tabular and relational data using l
 authors = ["Aivin V. Solatorio <[email protected]>"]
 readme = "README.md"
 license = "MIT"
-version = "0.2.0"
+version = "0.2.1"
 homepage = "https://github.com/avsolatorio/REaLTabFormer"
 documentation = "https://worldbank.github.io/REaLTabFormer/"
 

diff --git a/src/realtabformer/VERSION b/src/realtabformer/VERSION
@@ -1 +1 @@
-0.2.0
+0.2.1
diff --git a/src/realtabformer/realtabformer.py b/src/realtabformer/realtabformer.py
@@ -92,7 +92,7 @@ def __init__(
         freeze_parent_model: Optional[bool] = True,
         checkpoints_dir: str = "rtf_checkpoints",
         samples_save_dir: str = "rtf_samples",
-        epochs: int = 100,
+        epochs: int = 1000,
         batch_size: int = 8,
         random_state: int = 1029,
         train_size: float = 1,
@@ -121,7 +121,7 @@ def __init__(
                 frozen or not.
             checkpoints_dir:  Directory where the training checkpoints will be saved
             samples_save_dir: Save the samples generated by this model in this directory.
-            epochs: Number of epochs for training the GPT2LM model
+            epochs: Number of epochs for training the GPT2LM model. Use a large number of epochs to take advantage of the framework's optimal termination feature for the non-relational tabular data model. Defaults to 1000.
             batch_size: Batch size used for training. Must be adjusted based on the available
                 compute resource. TrainingArguments is set to use `gradient_accumulation_steps=4`
                 which will have an effective batch_size of 32 for the default value.