speedup auto_regression #290

mathause · 2023-09-18T16:14:41Z

Some profiling fun:

I tried to figure out why the tests/integration/test_calibrate_mesmer.py test are slowish. One main culprit is _fit_auto_regression_np and statsmodels: we need to set up the Autoreg model for each grid point individually, which checks the input every time, especially setting up the DeterministicProcess is slow. We can instantiate this once and pass it. This speeds up the tests by about 25% (from 12s to 8s or so). However, this does not seem to be recommended and is brittle (it copies implementation details). So I do not recommend to merge this PR (still would be nice to speed up this unnecessary slow repeated stuff...).

The code diff to profile is given below (and then visualize using snakeviz):

diff --git a/tests/integration/test_calibrate_mesmer.py b/tests/integration/test_calibrate_mesmer.py
index d374bce..04bf7b8 100644
--- a/tests/integration/test_calibrate_mesmer.py
+++ b/tests/integration/test_calibrate_mesmer.py
@@ -47,19 +47,31 @@ def test_calibrate_mesmer(
         "auxiliary",
     )

-    _calibrate_tas(
-        esms=test_esms,
-        scenarios_to_train=test_scenarios_to_train,
-        threshold_land=test_threshold_land,
-        output_file=test_output_file,
-        cmip_data_root_dir=test_cmip_data_root_dir,
-        cmip_generation=test_cmip_generation,
-        observations_root_dir=test_observations_root_dir,
-        auxiliary_data_dir=test_auxiliary_data_dir,
-        # save params as well - they are .gitignored
-        save_params=update_expected_files,
-        params_output_dir=params_output_dir,
-    )
+    from cProfile import Profile
+    from pstats import SortKey, Stats
+
+    with Profile() as profile:
+
+        _calibrate_tas(
+            esms=test_esms,
+            scenarios_to_train=test_scenarios_to_train,
+            threshold_land=test_threshold_land,
+            output_file=test_output_file,
+            cmip_data_root_dir=test_cmip_data_root_dir,
+            cmip_generation=test_cmip_generation,
+            observations_root_dir=test_observations_root_dir,
+            auxiliary_data_dir=test_auxiliary_data_dir,
+            # save params as well - they are .gitignored
+            save_params=update_expected_files,
+            params_output_dir=params_output_dir,
+        )
+
+        (
+            Stats(profile)
+            .strip_dirs()
+            .sort_stats(SortKey.CUMULATIVE)
+            .dump_stats("profile.out")
+        )

from cProfile import Profile
from pstats import SortKey, Stats

with Profile() as profile:
    
    function()

    (
        Stats(profile)
        .strip_dirs()
        .sort_stats(SortKey.CUMULATIVE)
        .dump_stats("profile.out")
    )

The visualization can be done with snakeviz

mesmer/stats/auto_regression.py

…/mesmer into speedup_auto_regression

speedup auto_regression

1f29548

mathause commented Sep 18, 2023

View reviewed changes

mesmer/stats/auto_regression.py Outdated Show resolved Hide resolved

make tests pass

e332deb

mathause mentioned this pull request Sep 18, 2023

_calibrate_tas: allow selecting predictors #291

Merged

4 tasks

mathause added 3 commits December 13, 2023 17:19

Merge branch 'main' into speedup_auto_regression

c51b278

Merge branch 'speedup_auto_regression' of https://github.com/mathause…

d17acee

…/mesmer into speedup_auto_regression

update comment

af9f7cb

mathause mentioned this pull request Jul 25, 2024

replace statsmodels' AutoReg with OLS? #483

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speedup auto_regression #290

speedup auto_regression #290

mathause commented Sep 18, 2023 •

edited

Loading

speedup auto_regression #290

Are you sure you want to change the base?

speedup auto_regression #290

Conversation

mathause commented Sep 18, 2023 • edited Loading

mathause commented Sep 18, 2023 •

edited

Loading