Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speedup auto_regression #290

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mathause
Copy link
Member

@mathause mathause commented Sep 18, 2023

Some profiling fun:

I tried to figure out why the tests/integration/test_calibrate_mesmer.py test are slowish. One main culprit is _fit_auto_regression_np and statsmodels: we need to set up the Autoreg model for each grid point individually, which checks the input every time, especially setting up the DeterministicProcess is slow. We can instantiate this once and pass it. This speeds up the tests by about 25% (from 12s to 8s or so). However, this does not seem to be recommended and is brittle (it copies implementation details). So I do not recommend to merge this PR (still would be nice to speed up this unnecessary slow repeated stuff...).

The code diff to profile is given below (and then visualize using snakeviz):

diff --git a/tests/integration/test_calibrate_mesmer.py b/tests/integration/test_calibrate_mesmer.py
index d374bce..04bf7b8 100644
--- a/tests/integration/test_calibrate_mesmer.py
+++ b/tests/integration/test_calibrate_mesmer.py
@@ -47,19 +47,31 @@ def test_calibrate_mesmer(
         "auxiliary",
     )

-    _calibrate_tas(
-        esms=test_esms,
-        scenarios_to_train=test_scenarios_to_train,
-        threshold_land=test_threshold_land,
-        output_file=test_output_file,
-        cmip_data_root_dir=test_cmip_data_root_dir,
-        cmip_generation=test_cmip_generation,
-        observations_root_dir=test_observations_root_dir,
-        auxiliary_data_dir=test_auxiliary_data_dir,
-        # save params as well - they are .gitignored
-        save_params=update_expected_files,
-        params_output_dir=params_output_dir,
-    )
+    from cProfile import Profile
+    from pstats import SortKey, Stats
+
+    with Profile() as profile:
+
+        _calibrate_tas(
+            esms=test_esms,
+            scenarios_to_train=test_scenarios_to_train,
+            threshold_land=test_threshold_land,
+            output_file=test_output_file,
+            cmip_data_root_dir=test_cmip_data_root_dir,
+            cmip_generation=test_cmip_generation,
+            observations_root_dir=test_observations_root_dir,
+            auxiliary_data_dir=test_auxiliary_data_dir,
+            # save params as well - they are .gitignored
+            save_params=update_expected_files,
+            params_output_dir=params_output_dir,
+        )
+
+        (
+            Stats(profile)
+            .strip_dirs()
+            .sort_stats(SortKey.CUMULATIVE)
+            .dump_stats("profile.out")
+        )
from cProfile import Profile
from pstats import SortKey, Stats

with Profile() as profile:
    
    function()

    (
        Stats(profile)
        .strip_dirs()
        .sort_stats(SortKey.CUMULATIVE)
        .dump_stats("profile.out")
    )

The visualization can be done with snakeviz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant