-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - Reentrant concurrent snoopi_deep profiles. #309
Conversation
Note that of course one must be careful when interpreting the results of concurrently running profiles, since inference caused by invocation B will also show up in the profile for invocation A. But this is true of all of julia's profiling and timing techniques right now, so it's just something the developer must be aware of. (Compare it with |
Codecov ReportBase: 85.47% // Head: 17.39% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #309 +/- ##
===========================================
- Coverage 85.47% 17.39% -68.09%
===========================================
Files 17 16 -1
Lines 2162 1903 -259
===========================================
- Hits 1848 331 -1517
- Misses 314 1572 +1258
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Update: it turns out that JuliaLang/julia#47258 and JuliaLang/julia#47260 are no longer valid, now that we're shrinking the inference lock (which is 🎉 in general though of course!). So this PR will have to wait, because we'll probably need to rethink the inference profiler entirely, given that it currently assumes single-threaded, locked access to the inference profile buffer. :/ |
CC: @Sacha0 for review plz |
@@ -863,6 +863,7 @@ end | |||
# pgdsgui(axs[2], rit; bystr="Inclusive", consts=true, interactive=false) | |||
end | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snuck in? :)
_name(frame::SnoopCompileCore.InferenceTiming) = frame.mi_info.mi.def.name | ||
|
||
@testset "reentrant concurrent profiles 1 - overlap" begin | ||
# Warmup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps something like the following? :)
# Warmup | |
# Warmup to prevent, e.g., `+` from appearing in the profiles; only `ROOT` and `foo*` should appear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Likewise on the warmup comments below if you agree.)
@eval foo1(x) = x+2 | ||
@eval foo1(2) | ||
|
||
# Test: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That this works is thoroughly pleasing.
sleep((i-1) / 10) # (Divide by 10 so the test isn't too slow) | ||
snoop_time = @timed SnoopCompile.@snoopi_deep @eval begin | ||
$(Symbol("foo$i"))(x) = x + 1 | ||
sleep(1.5 / 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope that these delays are large enough --- and think they should be by orders of magnitude --- to prevent nondeterministic failures due to jitter. But I've found the machines are just lying in wait, for the moment I think something like that. Then 🦈 🩸.
# Test that the inclusive time (the total time reported by snoopi_deep) matches | ||
# the actual time to do the snoopi_deep, as measured by `@time`. | ||
# These should both be approximately ~0.15 seconds. | ||
@info prof.mi_timing.inclusive_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging straggler? :)
@testset for i in eachindex(profs) | ||
prof = profs[i] | ||
# Test that the time for the inference is accounted for | ||
@test 0.15 < prof.mi_timing.exclusive_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear the definitions of these times has already become hazy to me 😅. This one? 🤔How do know that the exclusive rather than inclusive time should be over 0.15 seconds?
accum_root_time = time_ns() - inference_root_timing.cur_start_time | ||
current_root_time = inference_root_timing.time + accum_root_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit hazy on these details by now but trust that it is correct 👍
Thanks again for the review, Sacha. I'm closing this for now, since we ended up going a different approach, via JuliaLang/julia#47615 and #313. |
NOTE: It turns out this PR isn't mergeable as-is, since it depends on assumptions in Julia that aren't valid.
Namely, we need JuliaLang/julia#47258 and JuliaLang/julia#47260 to avoid data races, but it turns out those don't make sense against julia master/1.9, because of recent changes to the inference lock.
At RAI, we have backported those two PRs against 1.8.2, and we're going to keep hacking on this branch to pull it into our internal build. We will follow-up with actual changes to julia 1.9 and SnoopCompile to try to make this work for real.
Following option 3 from this doc:
https://docs.google.com/document/d/1JO43w3V1Cwtb2CxWf4sYfAB68xbkTrCMr973iaJOFwY/edit#
This is a WIP PR to support invoking
@snoopi_deep
concurrently and in parallel from multiple Tasks.(It currently requires JuliaLang/julia#47258 to avoid data races, and JuliaLang/julia#47260 to allow reentrant inference timing.)
The idea is basically to share the global julia type inference profile buffer, rather than clear it at the start of every profile, keeping track of the start and stop index for each invocation of snoopi_deep. Then we also clear the beginning of the profile buffer whenever the oldest invocation ends, to ensure we aren't leaking memory.