Update tests after feature selection change #1213

ahuber21 · 2023-03-14T15:53:40Z

With uxlfoundation/oneDAL#2292 some changes are introduced that need to be reflected on the scikit-learn-intelex side

Add useConstFeatures algorithm option
- The previously hardcoded value can now be changed as a parameter. The option defaults to the previously hardcoded value false
Feature sampling was changed: Sampling features for calculating the best node split was optimized, which results in different numerical values in the predictions. The overall result of the ensemble is still the same.

// from the commit message
    chore: update unit test reference values
    
    details: feature selection for node splitting was changed which results in
    different numerics for the prediction. mean and variance are still in good
    agreement
    
    mean:
      old -> 22.088
      new -> 22.104
    
    variance:
      old -> 49.4695
      new -> 49.4311

Edit:
During development I ran a RandomForestClassifier and I'm still producing similar results

Accuracy
  old -> 0.9313
  new -> 0.9316

Confusion matrix
  old
    [[6922    4]
     [ 511   63]]

  new
    [[6921    5]
     [ 508   66]]

However, the training time is greatly improved

old: 433.87 seconds
new: 17.25 seconds

Alexsandruss

sklearnex/preview/ensemble/forest wrappers should be updated too

Alexsandruss · 2023-03-14T17:42:30Z

daal4py/sklearn/ensemble/_forest.py

+        maxLeafNodes=0 if self.max_leaf_nodes is None else self.max_leaf_nodes,
+        maxBins=self.maxBins,
+        minBinSize=self.minBinSize,
+        useConstFeatures=self.useConstFeatures,


daal4py wrappers for previous oneDAL versions don't have useConstFeatures arg. Use daal_check_version for branching by oneDAL versions

Alexsandruss · 2023-03-14T17:42:52Z

tests/unittest_data/decision_forest_regression_batch.csv

@@ -1,127 +1,127 @@
-36.70242652


What is the need for changing data file?

See the PR description. Node splitting will be changed with uxlfoundation/oneDAL#2292, resulting in different values. As far as I can tell, the algorithm is still producing correct results.

Alexsandruss · 2023-03-23T18:19:56Z

CI is based on previous oneDAL release so unittest file should be dispatched based on oneDAL version for green CI

details: feature selection for node splitting was changed which results in different numerics for the prediction. mean and variance are still in good agreement mean: old -> 22.088 new -> 22.104 variance: old -> 49.4695 new -> 49.4311

KulikovNikita · 2023-03-24T11:17:41Z

daal4py/sklearn/ensemble/tests/test_decision_forest.py

@@ -33,7 +33,12 @@

 ACCURACY_RATIO = 0.95 if daal_check_version((2021, 'P', 400)) else 0.85
 MSE_RATIO = 1.07
-LOG_LOSS_RATIO = 1.4 if daal_check_version((2021, 'P', 400)) else 1.55
+if daal_check_version((2023, 'P', 101)):
+    LOG_LOSS_RATIO = 1.5


What have happened?

It is discussed in uxlfoundation/oneDAL#2292 and I am investigating the issue. I think my changes pronounce an existing bug for one particular test case even more. But I really would like to tackle speedup and accuracy in separate PRs

Alexsandruss · 2023-03-24T17:24:48Z

tests/test_examples.py

     'decision_forest_regression_batch.csv', lambda r: r[1].prediction, (2023, 'P', 1)),
    ('decision_forest_regression_hist_batch',
     'decision_forest_regression_batch.csv', lambda r: r[1].prediction, (2023, 'P', 1)),
+    ('decision_forest_regression_default_dense_batch',
+     'decision_forest_regression_batch_20230101.csv',
+     lambda r: r[1].prediction, (2023, 'P', 101)),
+    ('decision_forest_regression_hist_batch',
+     'decision_forest_regression_batch_20230101.csv',
+     lambda r: r[1].prediction, (2023, 'P', 101)),


This set of examples will fail when version updates from 2023.1.0 to 2023.1.1 because two sets of regression examples will be run.

ahuber21 requested review from Alexsandruss and samir-nasibli as code owners March 14, 2023 15:53

ahuber21 mentioned this pull request Mar 14, 2023

RF: fix performance of feature sampling for node splits uxlfoundation/oneDAL#2292

Merged

Alexsandruss requested changes Mar 14, 2023

View reviewed changes

ahuber21 force-pushed the update-tests-after-feature-selection-change branch from 49b9f1f to 6ef0a5d Compare March 23, 2023 15:08

ahuber21 requested review from napetrov, maria-Petrova and KulikovNikita as code owners March 23, 2023 15:08

ahuber21 force-pushed the update-tests-after-feature-selection-change branch from 6ef0a5d to 15b1890 Compare March 23, 2023 15:09

ahuber21 added 3 commits March 24, 2023 02:43

chore: update unit test reference values

ba7a8d4

details: feature selection for node splitting was changed which results in different numerics for the prediction. mean and variance are still in good agreement mean: old -> 22.088 new -> 22.104 variance: old -> 49.4695 new -> 49.4311

test(df): allow up to 50% disagreement on log loss

92c7277

Add updated df_regr numbers for 2023.1.1

b0653fc

ahuber21 force-pushed the update-tests-after-feature-selection-change branch 2 times, most recently from 629c9eb to 99ef708 Compare March 24, 2023 10:10

KulikovNikita reviewed Mar 24, 2023

View reviewed changes

ahuber21 force-pushed the update-tests-after-feature-selection-change branch from 99ef708 to cc22197 Compare March 24, 2023 14:04

update tests to support both 2023.0.1 and later versions

b6ba186

ahuber21 force-pushed the update-tests-after-feature-selection-change branch from cc22197 to b6ba186 Compare March 24, 2023 15:13

Alexsandruss reviewed Mar 24, 2023

View reviewed changes

Alexsandruss approved these changes Mar 24, 2023

View reviewed changes

ahuber21 merged commit ef57673 into uxlfoundation:master Mar 24, 2023

ahuber21 deleted the update-tests-after-feature-selection-change branch March 24, 2023 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update tests after feature selection change #1213

Update tests after feature selection change #1213

ahuber21 commented Mar 14, 2023 •

edited

Loading

Alexsandruss left a comment

Alexsandruss Mar 14, 2023

Alexsandruss Mar 14, 2023

ahuber21 Mar 15, 2023

Alexsandruss commented Mar 23, 2023

KulikovNikita Mar 24, 2023

ahuber21 Mar 24, 2023

Alexsandruss Mar 24, 2023

Update tests after feature selection change #1213

Update tests after feature selection change #1213

Conversation

ahuber21 commented Mar 14, 2023 • edited Loading

Alexsandruss left a comment

Choose a reason for hiding this comment

Alexsandruss Mar 14, 2023

Choose a reason for hiding this comment

Alexsandruss Mar 14, 2023

Choose a reason for hiding this comment

ahuber21 Mar 15, 2023

Choose a reason for hiding this comment

Alexsandruss commented Mar 23, 2023

KulikovNikita Mar 24, 2023

Choose a reason for hiding this comment

ahuber21 Mar 24, 2023

Choose a reason for hiding this comment

Alexsandruss Mar 24, 2023

Choose a reason for hiding this comment

ahuber21 commented Mar 14, 2023 •

edited

Loading