Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual/semi-automatic performance regression checking #356

Merged

Conversation

MarionBWeinzierl
Copy link
Collaborator

@MarionBWeinzierl MarionBWeinzierl commented Nov 12, 2024

Description

This branch includes code to run profiling on PyRealm, and check whether performance has degraded between

Fixes #256

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@MarionBWeinzierl MarionBWeinzierl self-assigned this Nov 12, 2024
@MarionBWeinzierl MarionBWeinzierl linked an issue Nov 12, 2024 that may be closed by this pull request
@MarionBWeinzierl MarionBWeinzierl marked this pull request as draft November 12, 2024 17:13
@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.09%. Comparing base (1f315ba) to head (955528d).
Report is 469 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #356      +/-   ##
===========================================
+ Coverage    95.29%   96.09%   +0.80%     
===========================================
  Files           28       35       +7     
  Lines         1720     2766    +1046     
===========================================
+ Hits          1639     2658    +1019     
- Misses          81      108      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MarionBWeinzierl
Copy link
Collaborator Author

MarionBWeinzierl commented Nov 18, 2024

The scripts are currently lacking any error checking etc, but before I do more, here are some questions @davidorme :

  1. Is it ok to have this as a bash script calling Python, or would you rather have a Python script (potentially calling some bash for the git parts)?
  2. This script uses the existing pytest-profile functionality. As you mentioned on Slack, we could also just call cProfile. I am wondering whether this changes something in the precision of the results.
  3. It seems that there is a bit of a fluctuation we would need to account for with a tolerance and/or larger datasets to reduce the impact of overhead (or maybe we have to focus on certain core functions?). Running the current script on the same two commit hashes three times gives, for example, the following results:

image
image
image

@davidorme
Copy link
Collaborator

The scripts are currently lacking any error checking etc, but before I do more, here are some questions @davidorme :

  1. Is it ok to have this as a bash script calling Python, or would you rather have a Python script (potentially calling some bash for the git parts)?

I think it would be better to do this all in Python (maybe using GitPython?) but that's mostly to avoid issues for currently hypothetical Windows based developers. So, for the moment, I think this is fine.

  1. This script uses the existing pytest-profile functionality. As you mentioned on Slack, we could also just call cProfile. I am wondering whether this changes something in the precision of the results.

I don't think it should - IIUC it's just that the pytest-profiling wraps exactly the same operation.

  1. It seems that there is a bit of a fluctuation we would need to account for with a tolerance and/or larger datasets to reduce the impact of overhead (or maybe we have to focus on certain core functions?). Running the current script on the same two commit hashes three times gives, for example, the following results:

Yeah - I think that increasing the load is probably the way to go, at least initially. The other thing is to dive deeper into the results to find which functions have changed, but that's probably a new PR.

@MarionBWeinzierl
Copy link
Collaborator Author

Talking about overheads.... I noticed that the most expensive parts in the list were those associated with running pytest etc:
image

I had assumed, when I created the simplified version, that these parts would be kind of similar in each run and therefore would not need excluding, but I was wrong. So I reinstantiated the exclude parameter to make sure those are thrown out. This is what I get now, running the same thing three times:
image
image
image

Given that the two commits that I am comparing are quite similar, I would argue that we might still want a, say, 5% tolerance or so.

@MarionBWeinzierl MarionBWeinzierl marked this pull request as ready for review November 25, 2024 14:14
@MarionBWeinzierl
Copy link
Collaborator Author

Do we want this in the CI (falling over if the codes get 5% slower), or as a manual thing?

local profiling and benchmarking.

See the [profiling and benchmarking
page](https://pyrealm.readthedocs.io/en/latest/development/profiling_and_benchmarking.md)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That link doe not work. Looking at https://github.com/ImperialCollegeLondon/pyrealm/blob/develop/docs/source/development/profiling_and_benchmarking.md , a lot of that information is also no longer valid. We will need to adapt that when we have decided how to proceed (i.e., keep the old benchmarking code, if and when to run the new code automatically, etc.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidorme would you prefer me to use this to replace the old run_benchmarking.py , or leave the old script in and keep this as simple_benchmarking.py? That decision will also influence how the above-mentioned documentation is extended or adapted/rewritten.

CONTRIBUTING.md Outdated Show resolved Hide resolved
@j-emberton
Copy link
Collaborator

j-emberton commented Dec 2, 2024

This issue found when trying to run ./performance_regression_checking.sh locally on M2 Mac.

image

Appears to be a Mac/Linux incompatibility.

Specific issue fixed by converting
poetry run /usr/bin/time -v pytest -m "profiling" --profile-svg
to
poetry run /usr/bin/time -l pytest -m "profiling" --profile-svg

@davidorme
Copy link
Collaborator

davidorme commented Dec 3, 2024

Something is up with the profiling in the CI. The profiling YAML is re-running the whole of the pyrealm_ci.yaml under the test job. I think we can probably simply add the profiling job as a new job under the docs job pyrealm_ci.yaml, which avoids having to import the previous workflow and ensures that the previous testing runs?

Also, the purpose of this job is to check that the profiling script works? We only need it to run so we can drop all of the graphviz stuff from the job and make it faster and cleaner.

I've pushed an update to set up what I think works? We'll see if it does!

@davidorme
Copy link
Collaborator

I think that's cleaner? It's not really doing a different workflow, it's adding a new CI test to the standard test and build suite, so it seems reasonable to just add it there.

With that change, I think we can delete:

.github/workflows/pyrealm_profiling_after_push.yaml
.github/workflows/pyrealm_profiling_on_approve.yaml
.github/workflows/pyrealm_profiling_without_benchmarking.yaml

Copy link
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - this works for me and I think it is a much saner approach. We need to work on the docs and possibly moving things into Python and adding functionality, but this gets us back on track.

CONTRIBUTING.md Outdated Show resolved Hide resolved
@MarionBWeinzierl
Copy link
Collaborator Author

I have created #358 as a follow-up regarding the old code and documentation. I am merging this in now, as incremental improvement.

@MarionBWeinzierl MarionBWeinzierl merged commit fda0906 into develop Dec 3, 2024
13 checks passed
@MarionBWeinzierl MarionBWeinzierl deleted the issue256-manual-performance-regression-checking branch December 3, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Develop manual performance regression checking
4 participants