-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inlining everything #183
Inlining everything #183
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #183 +/- ##
==========================================
+ Coverage 92.91% 92.96% +0.04%
==========================================
Files 10 10
Lines 1144 1152 +8
==========================================
+ Hits 1063 1071 +8
Misses 81 81 ☔ View full report in Codecov by Sentry. |
6ad8b50
to
859c3bc
Compare
859c3bc
to
492967a
Compare
I took a look at the buoyancy kernel, and I'm seeing a speedup on this branch (on the CPU): Main branch ******************* thermo_kernel
BenchmarkTools.Trial: 236 samples with 1 evaluation.
Range (min … max): 20.429 ms … 28.884 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 20.910 ms ┊ GC (median): 0.00%
Time (mean ± σ): 21.210 ms ± 876.287 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▄▃█▆▁
▃▄███████▅▄▃▄▅▂▃▄▃▁▂▂▁▂▃▃▄▃▄▂▂▃▂▃▃▃▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▃
20.4 ms Histogram: frequency by time 24.8 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
******************* buoyancy_gradients_kernel
BenchmarkTools.Trial: 27 samples with 1 evaluation.
Range (min … max): 180.741 ms … 196.250 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 183.876 ms ┊ GC (median): 0.00%
Time (mean ± σ): 185.915 ms ± 4.569 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▃ ▃ ██▃
▇█▁▁▇▁█▁▁▁███▇▇▇▇▁▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▇▁▁▇▁▇▁▁▁▁▁▁▁▇▁▁▁▁▁▇▁▇ ▁
181 ms Histogram: frequency by time 196 ms <
Memory estimate: 16 bytes, allocs estimate: 1. This branch: ******************* thermo_kernel
BenchmarkTools.Trial: 368 samples with 1 evaluation.
Range (min … max): 13.501 ms … 14.498 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 13.544 ms ┊ GC (median): 0.00%
Time (mean ± σ): 13.592 ms ± 143.209 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁██▇▆▅▆▄▃▃▂▁
████████████▇▆▁▁▄▄▁▁▁▆▁▇▄▁▆▁▁▆▄▁▁▁▁▄▄▁▄▁▄▄▁▁▁▄▁▁▁▁▁▁▄▆▆▇▆▄▄▆ ▇
13.5 ms Histogram: log(frequency) by time 14.1 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
******************* buoyancy_gradients_kernel
BenchmarkTools.Trial: 36 samples with 1 evaluation.
Range (min … max): 138.371 ms … 142.465 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 139.539 ms ┊ GC (median): 0.00%
Time (mean ± σ): 139.659 ms ± 890.868 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ █▂ ▂
▅▁▅▁▅▅█▅▅▁▅▅▁█▅▅▅██▁▁█▁▅▅▁▅▅▁▁▁▁▁▁▁▅▁▅▁▁▁▁▁▁▁▁▅▁▁▁▅▁▁▁▁▁▁▁▁▁▅ ▁
138 ms Histogram: frequency by time 142 ms <
Memory estimate: 16 bytes, allocs estimate: 1. I'm not sure if the input parameters are performing saturation adjustment, but it doesn't matter in the case of the buoyancy gradients kernel, since SA is never called regardless of what the inputs are. cc @glwagner, @szy21. I'm comfortable moving forward with this. Interestingly, I didn't realize this until now, it looks like the buoyancy kernel is allocating 🤔 |
This looks great! So maybe it will speed up the quadratures also... |
Yeah, I was thinking that, too |
Good news. |
492967a
to
ecfb71d
Compare
Also, from the updated benchmark, which I think may reflect the impact better for the CPU (we won't see differences on the gpu where we use main BenchmarkTools.Trial: 1273 samples with 1 evaluation.
Range (min … max): 3.875 ms … 4.541 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.881 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.928 ms ± 93.551 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃▁ ▃▃▁ ▃
███▆███▇████▆▆▅▇▆▆▃▅▅▄▇█▇█▅▅▇▆▆▅▅▄▁▁▄▄▁▃▄▃▃▄▅▁▁▁▁▄▁▃▅▃▁▁▁▄ ▇
3.87 ms Histogram: log(frequency) by time 4.34 ms <
Memory estimate: 544 bytes, allocs estimate: 10.
BenchmarkTools.Trial: 1390 samples with 1 evaluation.
Range (min … max): 3.563 ms … 4.238 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.569 ms ┊ GC (median): 0.00%
Time (mean ± σ): 3.596 ms ± 74.865 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▂ ▄▃▁
██▇████▇█▇▇█▇▄▆▆▇▆▅▅▆▇▄▇▅▄▅▃▅▃▃▅▁▁▃▃▁▁▁▃▁▃▁▁▃▁▃▁▁▁▃▁▃▁▁▄▁▄ ▇
3.56 ms Histogram: log(frequency) by time 4.03 ms <
Memory estimate: 0 bytes, allocs estimate: 0. This branch: BenchmarkTools.Trial: 1690 samples with 1 evaluation.
Range (min … max): 2.874 ms … 3.264 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.965 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.957 ms ± 40.091 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▄ █▁▃
▂▂▂▁▂▄▆▃▃▄▄▄▄▅▄▃▆▄██▆▃▃▃▂▃▅▄▂▅███▄▃▃▃▄▄▇█▃▅▆▄▅▃▂▃▂▂▂▁▁▁▁▁▁ ▃
2.87 ms Histogram: frequency by time 3.05 ms <
Memory estimate: 544 bytes, allocs estimate: 10.
BenchmarkTools.Trial: 1817 samples with 1 evaluation.
Range (min … max): 2.626 ms … 3.289 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.735 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.750 ms ± 69.871 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▄ ▃▄█▅▁ ▁▁▅
▁▂▃▂▂▅▄▅▅▇██▇██████████▆▅▅▇▇▅▃▅▆▆▅▃▄▅▄▅▅▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
2.63 ms Histogram: frequency by time 2.94 ms <
Memory estimate: 0 bytes, allocs estimate: 0. |
This PR attempts to systematically inline all thermo functions, to see the impact of performance.