Skip to content

0.2.2

Compare
Choose a tag to compare
@achoum achoum released this 15 Dec 17:13
· 489 commits to main since this release

Features

  • Surface the validation_interval_in_trees,
    keep_non_leaf_label_distribution and 'random_seed' hyper-parameters.
  • Add the batch_size argument in the pd_dataframe_to_tf_dataset utility.
  • Automatically determine the number of threads if num_threads=None.
  • Add constructor argument try_resume_training to facilitate resuming
    training.
  • Check that the training dataset is well configured for TF-DF e.g. no repeat
    operation, has a large enough batch size, etc. The check can be disabled
    with check_dataset=False.
  • When a model is created manually with the model builder, and if the dataspec
    is not provided, tries to adapt the dataspec so that the model looks as if
    it was trained with the global imputation strategy for missing values (i.e.
    missing_value_policy: GLOBAL_IMPUTATION). This makes manually created models
    more likely to be compatible with the fast inference engines.
  • TF-DF models fit method now passes the validation_data to the Yggdrasil
    learners. This is used for example for early stopping in the case of GBT
    model.
  • Add the "loss" parameter of the GBT model directly in the model constructor.
  • Control the amount of training logs displayed in the notebook (if using
    notebook) or in the console with the verbose constructor argument and
    fit parameter of the model.

Fixes

  • num_candidate_attributes is not ignored anymore when
    num_candidate_attributes_ratio=-1.
  • Use the median bucket split value strategy in the discretized numerical
    splitters (local and distributed).
  • Surface the max_num_scanned_rows_to_accumulate_statistics parameter to
    control how many examples are scanned to determine the feature statistics
    when training from a file dataset with fit_on_dataset_path.