WIP - Reentrant concurrent snoopi_deep profiles. #309

NHDaly · 2022-10-20T23:30:40Z

NOTE: It turns out this PR isn't mergeable as-is, since it depends on assumptions in Julia that aren't valid.
Namely, we need JuliaLang/julia#47258 and JuliaLang/julia#47260 to avoid data races, but it turns out those don't make sense against julia master/1.9, because of recent changes to the inference lock.

At RAI, we have backported those two PRs against 1.8.2, and we're going to keep hacking on this branch to pull it into our internal build. We will follow-up with actual changes to julia 1.9 and SnoopCompile to try to make this work for real.

Following option 3 from this doc:
https://docs.google.com/document/d/1JO43w3V1Cwtb2CxWf4sYfAB68xbkTrCMr973iaJOFwY/edit#

This is a WIP PR to support invoking @snoopi_deep concurrently and in parallel from multiple Tasks.
(It currently requires JuliaLang/julia#47258 to avoid data races, and JuliaLang/julia#47260 to allow reentrant inference timing.)

The idea is basically to share the global julia type inference profile buffer, rather than clear it at the start of every profile, keeping track of the start and stop index for each invocation of snoopi_deep. Then we also clear the beginning of the profile buffer whenever the oldest invocation ends, to ensure we aren't leaking memory.

NHDaly · 2022-10-20T23:33:05Z

Note that of course one must be careful when interpreting the results of concurrently running profiles, since inference caused by invocation B will also show up in the profile for invocation A. But this is true of all of julia's profiling and timing techniques right now, so it's just something the developer must be aware of. (Compare it with @time from multiple tasks reporting allocations caused by other tasks, or reporting compilation time from other tasks.)

codecov · 2022-10-20T23:36:12Z

Codecov Report

Base: 85.47% // Head: 17.39% // Decreases project coverage by -68.08% ⚠️

Coverage data is based on head (f86ce5b) compared to base (ad792af).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #309       +/-   ##
===========================================
- Coverage   85.47%   17.39%   -68.09%     
===========================================
  Files          17       16        -1     
  Lines        2162     1903      -259     
===========================================
- Hits         1848      331     -1517     
- Misses        314     1572     +1258

Impacted Files	Coverage Δ
SnoopCompileCore/src/snoopi_deep.jl	`0.00% <0.00%> (-92.00%)`	⬇️
SnoopCompileCore/src/snoopl.jl	`0.00% <0.00%> (-100.00%)`	⬇️
src/parcel_snoopl.jl	`0.00% <0.00%> (-90.00%)`	⬇️
src/parcel_snoopi_deep.jl	`0.00% <0.00%> (-89.91%)`	⬇️
src/invalidation_and_inference.jl	`0.00% <0.00%> (-89.66%)`	⬇️
src/invalidations.jl	`0.00% <0.00%> (-83.92%)`	⬇️
src/deep_demos.jl	`0.00% <0.00%> (-58.98%)`	⬇️
src/SnoopCompile.jl	`66.66% <0.00%> (-33.34%)`	⬇️
SnoopPrecompile/src/SnoopPrecompile.jl	`56.25% <0.00%> (-31.25%)`	⬇️
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

NHDaly · 2022-10-21T20:49:00Z

Update: it turns out that JuliaLang/julia#47258 and JuliaLang/julia#47260 are no longer valid, now that we're shrinking the inference lock (which is 🎉 in general though of course!).

So this PR will have to wait, because we'll probably need to rethink the inference profiler entirely, given that it currently assumes single-threaded, locked access to the inference profile buffer. :/

NHDaly · 2022-10-25T18:17:20Z

CC: @Sacha0 for review plz

Sacha0 · 2022-10-28T20:13:03Z

test/snoopi_deep.jl

@@ -863,6 +863,7 @@ end
    # pgdsgui(axs[2], rit; bystr="Inclusive", consts=true, interactive=false)
 end

+


Snuck in? :)

Suggested change

Sacha0 · 2022-10-28T20:15:05Z

test/snoopi_deep.jl

+_name(frame::SnoopCompileCore.InferenceTiming) = frame.mi_info.mi.def.name
+
+@testset "reentrant concurrent profiles 1 - overlap" begin
+    # Warmup


Perhaps something like the following? :)

Suggested change

# Warmup

# Warmup to prevent, e.g., `+` from appearing in the profiles; only `ROOT` and `foo*` should appear.

(Likewise on the warmup comments below if you agree.)

Sacha0 · 2022-10-28T20:18:11Z

test/snoopi_deep.jl

+    @eval foo1(x) = x+2
+    @eval foo1(2)
+
+    # Test:


That this works is thoroughly pleasing.

Sacha0 · 2022-10-28T20:25:12Z

test/snoopi_deep.jl

+                    sleep((i-1) / 10)  # (Divide by 10 so the test isn't too slow)
+                    snoop_time = @timed SnoopCompile.@snoopi_deep @eval begin
+                        $(Symbol("foo$i"))(x) = x + 1
+                        sleep(1.5 / 10)


I hope that these delays are large enough --- and think they should be by orders of magnitude --- to prevent nondeterministic failures due to jitter. But I've found the machines are just lying in wait, for the moment I think something like that. Then 🦈 🩸.

Sacha0 · 2022-10-28T21:10:09Z

test/snoopi_deep.jl

+        # Test that the inclusive time (the total time reported by snoopi_deep) matches
+        # the actual time to do the snoopi_deep, as measured by `@time`.
+        # These should both be approximately ~0.15 seconds.
+        @info prof.mi_timing.inclusive_time


Debugging straggler? :)

Sacha0 · 2022-10-28T21:11:24Z

test/snoopi_deep.jl

+    @testset for i in eachindex(profs)
+        prof = profs[i]
+        # Test that the time for the inference is accounted for
+        @test 0.15 < prof.mi_timing.exclusive_time


I fear the definitions of these times has already become hazy to me 😅. This one? 🤔How do know that the exclusive rather than inclusive time should be over 0.15 seconds?

Sacha0 · 2022-10-28T21:29:02Z

SnoopCompileCore/src/snoopi_deep.jl

+        accum_root_time = time_ns() - inference_root_timing.cur_start_time
+        current_root_time = inference_root_timing.time + accum_root_time


I'm a bit hazy on these details by now but trust that it is correct 👍

NHDaly · 2022-12-09T01:20:57Z

Thanks again for the review, Sacha. I'm closing this for now, since we ended up going a different approach, via JuliaLang/julia#47615 and #313.

WIP - Reentrant concurrent snoopi_deep profiles.

52e910f

NHDaly added 6 commits October 20, 2022 21:51

Add more tests, fix up some small things

4e3c77a

Add parallelism test. I think things are mostly working! 🎉

9d0e0df

Remove the printouts

d9fc875

Implement correct exclusive times for the ROOT() nodes.

eac22c9

Reorder tests: keep JET tests last

29e9c32

Delete accidentally duplicated tests

2e1d5cd

Add another tiny test

f86ce5b

Sacha0 reviewed Oct 28, 2022

View reviewed changes

test/snoopi_deep.jl

@eval foo1(x) = x+2

@eval foo1(2)

# Test:

Copy link

Sacha0 Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That this works is thoroughly pleasing.

Sacha0 reviewed Oct 28, 2022

View reviewed changes

NHDaly closed this Dec 9, 2022

NHDaly deleted the nhd-snoopi_deep-reentrant branch December 9, 2022 01:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP - Reentrant concurrent snoopi_deep profiles. #309

WIP - Reentrant concurrent snoopi_deep profiles. #309

NHDaly commented Oct 20, 2022 •

edited

Loading

NHDaly commented Oct 20, 2022

codecov bot commented Oct 20, 2022 •

edited

Loading

NHDaly commented Oct 21, 2022

NHDaly commented Oct 25, 2022

Sacha0 Oct 28, 2022

Sacha0 Oct 28, 2022

Sacha0 Oct 28, 2022

Sacha0 Oct 28, 2022

Sacha0 Oct 28, 2022

Sacha0 Oct 28, 2022

Sacha0 Oct 28, 2022

Sacha0 Oct 28, 2022

NHDaly commented Dec 9, 2022

		@@ -863,6 +863,7 @@ end
		# pgdsgui(axs[2], rit; bystr="Inclusive", consts=true, interactive=false)
		end

	# Warmup
	# Warmup to prevent, e.g., `+` from appearing in the profiles; only `ROOT` and `foo*` should appear.

		accum_root_time = time_ns() - inference_root_timing.cur_start_time
		current_root_time = inference_root_timing.time + accum_root_time

WIP - Reentrant concurrent snoopi_deep profiles. #309

WIP - Reentrant concurrent snoopi_deep profiles. #309

Conversation

NHDaly commented Oct 20, 2022 • edited Loading

NHDaly commented Oct 20, 2022

codecov bot commented Oct 20, 2022 • edited Loading

Codecov Report

NHDaly commented Oct 21, 2022

NHDaly commented Oct 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NHDaly commented Dec 9, 2022

NHDaly commented Oct 20, 2022 •

edited

Loading

codecov bot commented Oct 20, 2022 •

edited

Loading