diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 49f7038e..d03869bb 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-11-24T16:07:42","documenter_version":"1.8.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.2","generation_timestamp":"2024-12-10T11:35:43","documenter_version":"1.8.0"}} \ No newline at end of file diff --git a/dev/explanations/basic/index.html b/dev/explanations/basic/index.html index fbccb6c0..a5c3bfc4 100644 --- a/dev/explanations/basic/index.html +++ b/dev/explanations/basic/index.html @@ -1,2 +1,2 @@ -Understanding SnoopCompile and Julia's compilation pipeline · SnoopCompile

Understanding SnoopCompile and Julia's compilation pipeline

Julia uses Just-in-time (JIT) compilation to generate the code that runs on your CPU. Broadly speaking, there are two major compilation steps: inference and code generation. Inference is the process of determining the type of each object, which in turn determines which specific methods get called; once type inference is complete, code generation performs optimizations and ultimately generates the assembly language (native code) used on CPUs. Some aspects of this process are documented here.

Using code that has never been compiled requires that it first be JIT-compiled, and this contributes to the latency of using the package. In some circumstances, you can cache (store) the results of compilation to files to reduce the latency when your package is used. These files are the the *.ji and *.so files that live in the compiled directory of your Julia depot, usually located at ~/.julia/compiled. However, if these files become large, loading them can be another source for latency. Julia needs time both to load and validate the cached compiled code. Minimizing the latency of using a package involves focusing on caching the compilation of code that is both commonly used and takes time to compile.

Caching code for later use is called precompilation. Julia has had some forms of precompilation almost since the very first packages. However, it was Julia 1.9 that first supported "complete" precompilation, including the ability to store native code in shared-library cache files.

SnoopCompile is designed to try to allow you to analyze the costs of JIT-compilation, identify key bottlenecks that contribute to latency, and set up precompile directives to see whether it produces measurable benefits.

Package precompilation

When a package is precompiled, here's what happens under the hood:

  • Julia loads all of the package's dependencies (the ones in the [deps] section of the Project.toml file), typically from precompile cache files
  • Julia evaluates the source code (text files) that define the package module(s). Evaluating function foo(args...) ... end creates a new method foo. Note that:
    • the source code might also contain statements that create "data" (e.g., consts). In some cases this can lead to some subtle precompilation "gotchas"
    • the source code might also contain a precompile workload, which forces compilation and tracking of package methods.
  • Julia iterates over the module contents and writes the result to disk. Note that the module contents might include compiled code, and if so it is written along with everything else to the cache file.

When Julia loads your package, it just loads the "snapshot" stored in the cache file: it does not re-evaluate the source-text files that defined your package! It is appropriate to think of the source files of your package as "build scripts" that create your module; once the "build scripts" are executed, it's the module itself that gets cached, and the job of the build scripts is done.

+Understanding SnoopCompile and Julia's compilation pipeline · SnoopCompile

Understanding SnoopCompile and Julia's compilation pipeline

Julia uses Just-in-time (JIT) compilation to generate the code that runs on your CPU. Broadly speaking, there are two major compilation steps: inference and code generation. Inference is the process of determining the type of each object, which in turn determines which specific methods get called; once type inference is complete, code generation performs optimizations and ultimately generates the assembly language (native code) used on CPUs. Some aspects of this process are documented here.

Using code that has never been compiled requires that it first be JIT-compiled, and this contributes to the latency of using the package. In some circumstances, you can cache (store) the results of compilation to files to reduce the latency when your package is used. These files are the the *.ji and *.so files that live in the compiled directory of your Julia depot, usually located at ~/.julia/compiled. However, if these files become large, loading them can be another source for latency. Julia needs time both to load and validate the cached compiled code. Minimizing the latency of using a package involves focusing on caching the compilation of code that is both commonly used and takes time to compile.

Caching code for later use is called precompilation. Julia has had some forms of precompilation almost since the very first packages. However, it was Julia 1.9 that first supported "complete" precompilation, including the ability to store native code in shared-library cache files.

SnoopCompile is designed to try to allow you to analyze the costs of JIT-compilation, identify key bottlenecks that contribute to latency, and set up precompile directives to see whether it produces measurable benefits.

Package precompilation

When a package is precompiled, here's what happens under the hood:

  • Julia loads all of the package's dependencies (the ones in the [deps] section of the Project.toml file), typically from precompile cache files
  • Julia evaluates the source code (text files) that define the package module(s). Evaluating function foo(args...) ... end creates a new method foo. Note that:
    • the source code might also contain statements that create "data" (e.g., consts). In some cases this can lead to some subtle precompilation "gotchas"
    • the source code might also contain a precompile workload, which forces compilation and tracking of package methods.
  • Julia iterates over the module contents and writes the result to disk. Note that the module contents might include compiled code, and if so it is written along with everything else to the cache file.

When Julia loads your package, it just loads the "snapshot" stored in the cache file: it does not re-evaluate the source-text files that defined your package! It is appropriate to think of the source files of your package as "build scripts" that create your module; once the "build scripts" are executed, it's the module itself that gets cached, and the job of the build scripts is done.

diff --git a/dev/explanations/fixing_inference/index.html b/dev/explanations/fixing_inference/index.html index 51189b3a..0223926d 100644 --- a/dev/explanations/fixing_inference/index.html +++ b/dev/explanations/fixing_inference/index.html @@ -50,4 +50,4 @@ return getfield(d, :maker)::Union{String,Symbol} end return getfield(d, name) -end

Julia's constant propagation will ensure that most accesses of those fields will be determined at compile-time, so this simple change robustly fixes many inference problems.

Fixing Core.Box

Julia issue 15276 is one of the more surprising forms of inference failure; it is the most common cause of a Core.Box annotation. If other variables depend on the Boxed variable, then a single Core.Box can lead to widespread inference problems. For this reason, these are also among the first inference problems you should tackle.

Read this explanation of why this happens and what you can do to fix it. If you are directed to find Core.Box inference triggers via suggest, you may need to explore around the call site a bit– the inference trigger may be in the closure itself, but the fix needs to go in the method that creates the closure.

Use of ascend is highly recommended for fixing Core.Box inference failures.

Handling edge cases

You can sometimes get invalidations from failing to handle "formal" possibilities. For example, operations with regular expressions might return a Union{Nothing, RegexMatch}. You can sometimes get poor type inference by writing code that fails to take account of the possibility that nothing might be returned. For example, a comprehension

ms = [m.match for m in match.((rex,), my_strings)]

might be replaced with

ms = [m.match for m in match.((rex,), my_strings) if m !== nothing]

and return a better-typed result.

+end

Julia's constant propagation will ensure that most accesses of those fields will be determined at compile-time, so this simple change robustly fixes many inference problems.

Fixing Core.Box

Julia issue 15276 is one of the more surprising forms of inference failure; it is the most common cause of a Core.Box annotation. If other variables depend on the Boxed variable, then a single Core.Box can lead to widespread inference problems. For this reason, these are also among the first inference problems you should tackle.

Read this explanation of why this happens and what you can do to fix it. If you are directed to find Core.Box inference triggers via suggest, you may need to explore around the call site a bit– the inference trigger may be in the closure itself, but the fix needs to go in the method that creates the closure.

Use of ascend is highly recommended for fixing Core.Box inference failures.

Handling edge cases

You can sometimes get invalidations from failing to handle "formal" possibilities. For example, operations with regular expressions might return a Union{Nothing, RegexMatch}. You can sometimes get poor type inference by writing code that fails to take account of the possibility that nothing might be returned. For example, a comprehension

ms = [m.match for m in match.((rex,), my_strings)]

might be replaced with

ms = [m.match for m in match.((rex,), my_strings) if m !== nothing]

and return a better-typed result.

diff --git a/dev/explanations/gotchas/index.html b/dev/explanations/gotchas/index.html index 391b28ef..63e58e2a 100644 --- a/dev/explanations/gotchas/index.html +++ b/dev/explanations/gotchas/index.html @@ -2,4 +2,4 @@ Precompilation "gotcha"s · SnoopCompile

Precompilation "gotcha"s

Running code during module definition

Suppose you're working on an astronomy package and your source code has a line

const planets = map(makeplanet, ["Mercury", ...])

Julia will dutifully create planets and store it in the package's precompile cache file. This also runs makeplanet, and if this is the first time it gets run, it will compile makeplanet. Assuming that makeplanet is a method defined in the package, the compiled code for makeplanet will be stored in the cache file.

However, two circumstances can lead to puzzling omissions from the cache files:

  • if makeplanet is a method defined in a dependency of your package, it will not be cached in your package. You'd want to add precompilation of makeplanet to the package that creates that method.
  • if makeplanet is poorly-infered and uses runtime dispatch, any such callees that are not owned by your package will not be cached. For example, suppose makeplanet ends up calling methods in Base Julia or its standard libraries that are not precompiled into Julia itself: the compiled code for those methods will not be added to the cache file.

One option to ensure this dependent code gets cached is to create planets inside PrecompileTools.@compile_workload:

@compile_workload begin
     global planets
     const planet = map(makeplanet, ["Mercury", ...])
-end

Note that your package definition can have multiple @compile_workload blocks.

+end

Note that your package definition can have multiple @compile_workload blocks.

diff --git a/dev/explanations/tools/index.html b/dev/explanations/tools/index.html index 94486d9d..a1809f04 100644 --- a/dev/explanations/tools/index.html +++ b/dev/explanations/tools/index.html @@ -1,2 +1,2 @@ -Package roles and alternatives · SnoopCompile

Package roles and alternatives

SnoopCompileCore

SnoopCompileCore is a tiny package with no dependencies; it's used for collecting data, and it has been designed in such a way that it cannot cause any invalidations of its own. Collecting data on invalidations and inference with SnoopCompileCore is the only way you can be sure you are observing the "native state" of your code.

SnoopCompile

SnoopCompile is a much larger package that performs analysis on the data collected by SnoopCompileCore; loading SnoopCompile can (and does) trigger invalidations. Consequently, you're urged to always collect data with just SnoopCompileCore loaded, and wait to load SnoopCompile until after you've finished collecting the data.

Cthulhu

Cthulhu is a companion package that gives deep insights into the origin of invalidations or inference failures.

AbstractTrees

AbstractTrees is the one package in this list that can be both a "workhorse" and a developer tool. SnoopCompile uses it mostly for pretty-printing.

JET

JET is a powerful developer tool that in some ways is an alternative to SnoopCompile. While the two have different goals, the packages have some overlap in what they can tell you about your code. However, their mechanisms of action are fundamentally different:

  • JET is a "static analyzer," which means that it analyzes the code itself. JET can tell you about inference failures (runtime dispatch) much like SnoopCompile, with a major advantage: SnoopCompileCore omits information about any callees that are already compiled, but JET's @report_opt provides exhaustive information about the entire inferable callgraph (i.e., the part of the callgraph that inference can predict from the initial call) regardless of whether it has been previously compiled. With JET, you don't have to remember to run each analysis in a fresh session.

  • SnoopCompileCore collects data by watching normal inference at work. On code that hasn't been compiled previously, this can yield results similar to JET's, with a different major advantage: JET can't "see through" runtime dispatch, but SnoopCompileCore can. With SnoopCompile, you can immediately get a wholistic view of your entire callgraph.

Combining JET and SnoopCompile can provide insights that are difficult to obtain with either package in isolation. See the Tutorial on JET integration.

+Package roles and alternatives · SnoopCompile

Package roles and alternatives

SnoopCompileCore

SnoopCompileCore is a tiny package with no dependencies; it's used for collecting data, and it has been designed in such a way that it cannot cause any invalidations of its own. Collecting data on invalidations and inference with SnoopCompileCore is the only way you can be sure you are observing the "native state" of your code.

SnoopCompile

SnoopCompile is a much larger package that performs analysis on the data collected by SnoopCompileCore; loading SnoopCompile can (and does) trigger invalidations. Consequently, you're urged to always collect data with just SnoopCompileCore loaded, and wait to load SnoopCompile until after you've finished collecting the data.

Cthulhu

Cthulhu is a companion package that gives deep insights into the origin of invalidations or inference failures.

AbstractTrees

AbstractTrees is the one package in this list that can be both a "workhorse" and a developer tool. SnoopCompile uses it mostly for pretty-printing.

JET

JET is a powerful developer tool that in some ways is an alternative to SnoopCompile. While the two have different goals, the packages have some overlap in what they can tell you about your code. However, their mechanisms of action are fundamentally different:

  • JET is a "static analyzer," which means that it analyzes the code itself. JET can tell you about inference failures (runtime dispatch) much like SnoopCompile, with a major advantage: SnoopCompileCore omits information about any callees that are already compiled, but JET's @report_opt provides exhaustive information about the entire inferable callgraph (i.e., the part of the callgraph that inference can predict from the initial call) regardless of whether it has been previously compiled. With JET, you don't have to remember to run each analysis in a fresh session.

  • SnoopCompileCore collects data by watching normal inference at work. On code that hasn't been compiled previously, this can yield results similar to JET's, with a different major advantage: JET can't "see through" runtime dispatch, but SnoopCompileCore can. With SnoopCompile, you can immediately get a wholistic view of your entire callgraph.

Combining JET and SnoopCompile can provide insights that are difficult to obtain with either package in isolation. See the Tutorial on JET integration.

diff --git a/dev/index.html b/dev/index.html index 05d97f80..022f2739 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -SnoopCompile.jl · SnoopCompile

SnoopCompile.jl

Julia is fast, but its execution speed depends on optimizing code through compilation. Code must be compiled before you can use it, and unfortunately compilation is slow. This can cause latency the first time you use code: this latency is often called time-to-first-plot (TTFP) or more generally time-to-first-execution (TTFX). If something feels slow the first time you use it, and fast thereafter, you're probably experiencing the latency of compilation. Note that TTFX is distinct from time-to-load (TTL, which refers to the time you spend waiting for using MyPkg to finish), even though both contribute to latency.

Modern versions of Julia can store compiled code to disk (precompilation) to reduce or eliminate latency. Users and developers who are interested in reducing TTFX should first head to PrecompileTools, read its documentation thoroughly, and try using it to solve latency problems.

This package, SnoopCompile, should be considered when:

  • precompilation doesn't reduce TTFX as much as you wish
  • precompilation "works," but only in isolation: as soon as you load (certain) additional packages, TTFX is bad again
  • you're wondering if you can reduce the amount of time needed to precompile your package and/or the size of the precompilation cache files

In other words, SnoopCompile is a diagonostic package that helps reveal the causes of latency. Historically, it proceeded PrecompileTools, and indeed PrecompileTools was split out from SnoopCompile. Today, SnoopCompile is generally needed only when PrecompileTools fails to deliver the desired benefits.

SnoopCompile analysis modes

SnoopCompile "snoops" on the Julia compiler, collecting information that may be useful to developers. Here are some of the things you can do with SnoopCompile:

  • diagnose invalidations, cases where Julia must throw away previously-compiled code (see Tutorial on @snoop_invalidations)
  • trace inference, to learn what code is being newly (or freshly) analyzed in an early stage of the compilation pipeline (Tutorial on @snoop_inference)
  • trace code generation by LLVM, a late stage in the compilation pipeline (Tutorial on @snoop_llvm)
  • reveal methods with excessive numbers of compiler-generated specializations, a.k.a.profile-guided despecialization (Tutorial on PGDS)
  • integrate with tools like JET to further reduce the risk that your lovingly-precompiled code will be invalidated by loading other packages (Tutorial on JET integration)

Background information

If nothing else, you should know this:

  • invalidations occur when you load code (e.g., using MyPkg) or otherwise define new methods
  • inference and other stages of compilation occur the first time you run code for a particular combination of input types

The individual tutorials briefly explain core concepts. More detail can be found in Understanding SnoopCompile and Julia's compilation pipeline.

Who should use this package

SnoopCompile is intended primarily for package developers who want to improve the experience for their users. It is also recommended for users who are willing to "dig deep" and understand why packages they depend on have high latency. Your experience with latency may be personal, as it can depend on the specific combination of packages you load. If latency troubles you, don't make the assumption that it must be unfixable: you might be the first person affected by that specific cause of latency.

+SnoopCompile.jl · SnoopCompile

SnoopCompile.jl

Julia is fast, but its execution speed depends on optimizing code through compilation. Code must be compiled before you can use it, and unfortunately compilation is slow. This can cause latency the first time you use code: this latency is often called time-to-first-plot (TTFP) or more generally time-to-first-execution (TTFX). If something feels slow the first time you use it, and fast thereafter, you're probably experiencing the latency of compilation. Note that TTFX is distinct from time-to-load (TTL, which refers to the time you spend waiting for using MyPkg to finish), even though both contribute to latency.

Modern versions of Julia can store compiled code to disk (precompilation) to reduce or eliminate latency. Users and developers who are interested in reducing TTFX should first head to PrecompileTools, read its documentation thoroughly, and try using it to solve latency problems.

This package, SnoopCompile, should be considered when:

  • precompilation doesn't reduce TTFX as much as you wish
  • precompilation "works," but only in isolation: as soon as you load (certain) additional packages, TTFX is bad again
  • you're wondering if you can reduce the amount of time needed to precompile your package and/or the size of the precompilation cache files

In other words, SnoopCompile is a diagonostic package that helps reveal the causes of latency. Historically, it proceeded PrecompileTools, and indeed PrecompileTools was split out from SnoopCompile. Today, SnoopCompile is generally needed only when PrecompileTools fails to deliver the desired benefits.

SnoopCompile analysis modes

SnoopCompile "snoops" on the Julia compiler, collecting information that may be useful to developers. Here are some of the things you can do with SnoopCompile:

  • diagnose invalidations, cases where Julia must throw away previously-compiled code (see Tutorial on @snoop_invalidations)
  • trace inference, to learn what code is being newly (or freshly) analyzed in an early stage of the compilation pipeline (Tutorial on @snoop_inference)
  • trace code generation by LLVM, a late stage in the compilation pipeline (Tutorial on @snoop_llvm)
  • reveal methods with excessive numbers of compiler-generated specializations, a.k.a.profile-guided despecialization (Tutorial on PGDS)
  • integrate with tools like JET to further reduce the risk that your lovingly-precompiled code will be invalidated by loading other packages (Tutorial on JET integration)

Background information

If nothing else, you should know this:

  • invalidations occur when you load code (e.g., using MyPkg) or otherwise define new methods
  • inference and other stages of compilation occur the first time you run code for a particular combination of input types

The individual tutorials briefly explain core concepts. More detail can be found in Understanding SnoopCompile and Julia's compilation pipeline.

Who should use this package

SnoopCompile is intended primarily for package developers who want to improve the experience for their users. It is also recommended for users who are willing to "dig deep" and understand why packages they depend on have high latency. Your experience with latency may be personal, as it can depend on the specific combination of packages you load. If latency troubles you, don't make the assumption that it must be unfixable: you might be the first person affected by that specific cause of latency.

diff --git a/dev/objects.inv b/dev/objects.inv index 926119ef..f1b594d0 100644 Binary files a/dev/objects.inv and b/dev/objects.inv differ diff --git a/dev/reference/index.html b/dev/reference/index.html index b9e5cf9f..d92dcf48 100644 --- a/dev/reference/index.html +++ b/dev/reference/index.html @@ -1,14 +1,14 @@ -Reference · SnoopCompile

Reference

Data collection

SnoopCompileCore.@snoop_invalidationsMacro
invs = @snoop_invalidations expr

Capture method cache invalidations triggered by evaluating expr. invs is a sequence of invalidated Core.MethodInstances together with "explanations," consisting of integers (encoding depth) and strings (documenting the source of an invalidation).

Unless you are working at a low level, you essentially always want to pass invs directly to SnoopCompile.invalidation_trees.

Extended help

invs is in a format where the "reason" comes after the items. Method deletion results in the sequence

[zero or more (mi, "invalidate_mt_cache") pairs..., zero or more (depth1 tree, loctag) pairs..., method, loctag] with loctag = "jl_method_table_disable"

where mi means a MethodInstance. depth1 means a sequence starting at depth=1.

Method insertion results in the sequence

[zero or more (depth0 tree, sig) pairs..., same info as with delete_method except loctag = "jl_method_table_insert"]

The authoritative reference is Julia's own src/gf.c file.

source
SnoopCompileCore.@snoop_inferenceMacro
tinf = @snoop_inference commands;

Produce a profile of julia's type inference, recording the amount of time spent inferring every MethodInstance processed while executing commands. Each fresh entrance to type inference (whether executed directly in commands or because a call was made by runtime-dispatch) also collects a backtrace so the caller can be identified.

tinf is a tree, each node containing data on a particular inference "frame" (the method, argument-type specializations, parameters, and even any constant-propagated values). Each reports the exclusive/inclusive times, where the exclusive time corresponds to the time spent inferring this frame in and of itself, whereas the inclusive time includes the time needed to infer all the callees of this frame.

The top-level node in this profile tree is ROOT. Uniquely, its exclusive time corresponds to the time spent not in julia's type inference (codegen, llvm_opt, runtime, etc).

Working with tinf effectively requires loading SnoopCompile.

Warning

Note the semicolon ; at the end of the @snoop_inference macro call. Because SnoopCompileCore is not permitted to invalidate any code, it cannot define the Base.show methods that pretty-print tinf. Defer inspection of tinf until SnoopCompile has been loaded.

Example

julia> tinf = @snoop_inference begin
+Reference · SnoopCompile

Reference

Data collection

SnoopCompileCore.@snoop_invalidationsMacro
invs = @snoop_invalidations expr

Capture method cache invalidations triggered by evaluating expr. invs is a sequence of invalidated Core.MethodInstances together with "explanations," consisting of integers (encoding depth) and strings (documenting the source of an invalidation).

Unless you are working at a low level, you essentially always want to pass invs directly to SnoopCompile.invalidation_trees.

Extended help

invs is in a format where the "reason" comes after the items. Method deletion results in the sequence

[zero or more (mi, "invalidate_mt_cache") pairs..., zero or more (depth1 tree, loctag) pairs..., method, loctag] with loctag = "jl_method_table_disable"

where mi means a MethodInstance. depth1 means a sequence starting at depth=1.

Method insertion results in the sequence

[zero or more (depth0 tree, sig) pairs..., same info as with delete_method except loctag = "jl_method_table_insert"]

The authoritative reference is Julia's own src/gf.c file.

source
SnoopCompileCore.@snoop_inferenceMacro
tinf = @snoop_inference commands;

Produce a profile of julia's type inference, recording the amount of time spent inferring every MethodInstance processed while executing commands. Each fresh entrance to type inference (whether executed directly in commands or because a call was made by runtime-dispatch) also collects a backtrace so the caller can be identified.

tinf is a tree, each node containing data on a particular inference "frame" (the method, argument-type specializations, parameters, and even any constant-propagated values). Each reports the exclusive/inclusive times, where the exclusive time corresponds to the time spent inferring this frame in and of itself, whereas the inclusive time includes the time needed to infer all the callees of this frame.

The top-level node in this profile tree is ROOT. Uniquely, its exclusive time corresponds to the time spent not in julia's type inference (codegen, llvm_opt, runtime, etc).

Working with tinf effectively requires loading SnoopCompile.

Warning

Note the semicolon ; at the end of the @snoop_inference macro call. Because SnoopCompileCore is not permitted to invalidate any code, it cannot define the Base.show methods that pretty-print tinf. Defer inspection of tinf until SnoopCompile has been loaded.

Example

julia> tinf = @snoop_inference begin
            sort(rand(100))  # Evaluate some code and profile julia's type inference
-       end;
source
SnoopCompileCore.@snoop_llvmMacro
@snoop_llvm "func_names.csv" "llvm_timings.yaml" begin
     # Commands to execute, in a new process
-end

causes the julia compiler to log timing information for LLVM optimization during the provided commands to the files "funcnames.csv" and "llvmtimings.yaml". These files can be used for the input to SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml").

The logs contain the amount of time spent optimizing each "llvm module", and information about each module, where a module is a collection of functions being optimized together.

source

GUIs

SnoopCompile.flamegraphFunction
flamegraph(tinf::InferenceTimingNode; tmin=0.0, excluded_modules=Set([Main]), mode=nothing)

Convert the call tree of inference timings returned from @snoop_inference into a FlameGraph. Returns a FlameGraphs.FlameGraph structure that represents the timing trace recorded for type inference.

Frames that take less than tmin seconds of inclusive time will not be included in the resultant FlameGraph (meaning total time including it and all of its children). This can be helpful if you have a very big profile, to save on processing time.

Non-precompilable frames are marked in reddish colors. excluded_modules can be used to mark methods defined in modules to which you cannot or do not wish to add precompiles.

mode controls how frames are named in tools like ProfileView. nothing uses the default of just the qualified function name, whereas supplying mode=Dict(method => count) counting the number of specializations of each method will cause the number of specializations to be included in the frame name.

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
+end

causes the julia compiler to log timing information for LLVM optimization during the provided commands to the files "funcnames.csv" and "llvmtimings.yaml". These files can be used for the input to SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml").

The logs contain the amount of time spent optimizing each "llvm module", and information about each module, where a module is a collection of functions being optimized together.

source

GUIs

SnoopCompile.flamegraphFunction
flamegraph(tinf::InferenceTimingNode; tmin=0.0, excluded_modules=Set([Main]), mode=nothing)

Convert the call tree of inference timings returned from @snoop_inference into a FlameGraph. Returns a FlameGraphs.FlameGraph structure that represents the timing trace recorded for type inference.

Frames that take less than tmin seconds of inclusive time will not be included in the resultant FlameGraph (meaning total time including it and all of its children). This can be helpful if you have a very big profile, to save on processing time.

Non-precompilable frames are marked in reddish colors. excluded_modules can be used to mark methods defined in modules to which you cannot or do not wish to add precompiles.

mode controls how frames are named in tools like ProfileView. nothing uses the default of just the qualified function name, whereas supplying mode=Dict(method => count) counting the number of specializations of each method will cause the number of specializations to be included in the frame name.

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
 InferenceTimingNode: 0.002148974/0.002767166 on Core.Compiler.Timings.ROOT() with 1 direct children
 
 julia> fg = flamegraph(tinf)
-Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:75, 0x00, 0:3334431))
julia> ProfileView.view(fg);  # Display the FlameGraph in a package that supports it

You should be able to reconcile the resulting flamegraph to print_tree(tinf) (see flatten).

The empty horizontal periods in the flamegraph correspond to times when something other than inference is running. The total width of the flamegraph is set from the ROOT node.

source
SnoopCompile.pgdsguiFunction
methodref, ax = pgdsgui(tinf::InferenceTimingNode; consts::Bool=true, by=inclusive)
-methodref     = pgdsgui(ax, tinf::InferenceTimingNode; kwargs...)

Create a scatter plot comparing: - (vertical axis) the inference time for all instances of each Method, as captured by tinf; - (horizontal axis) the run time cost, as estimated by capturing a @profile before calling this function.

Each dot corresponds to a single method. The face color encodes the number of times that method was inferred, and the edge color corresponds to the fraction of the runtime spent on runtime dispatch (black is 0%, bright red is 100%). Clicking on a dot prints the method (or location, if inlined) to the REPL, and sets methodref[] to that method.

ax is the pyplot axis of the scatterplot.

Compat

pgdsgui depends on PyPlot via Julia extensions. You must load both SnoopCompile and PyPlot for this function to be defined.

source

Analysis of invalidations

SnoopCompile.uinvalidatedFunction
umis = uinvalidated(invlist)

Return the unique invalidated MethodInstances. invlist is obtained from SnoopCompileCore.@snoop_invalidations. This is similar to filtering for MethodInstances in invlist, except that it discards any tagged "invalidate_mt_cache". These can typically be ignored because they are nearly inconsequential: they do not invalidate any compiled code, they only transiently affect an optimization of runtime dispatch.

source
SnoopCompile.invalidation_treesFunction
trees = invalidation_trees(list)

Parse list, as captured by SnoopCompileCore.@snoop_invalidations, into a set of invalidation trees, where parents nodes were called by their children.

Example

julia> f(x::Int)  = 1
+Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:75, 0x00, 0:3334431))
julia> ProfileView.view(fg);  # Display the FlameGraph in a package that supports it

You should be able to reconcile the resulting flamegraph to print_tree(tinf) (see flatten).

The empty horizontal periods in the flamegraph correspond to times when something other than inference is running. The total width of the flamegraph is set from the ROOT node.

source
SnoopCompile.pgdsguiFunction
methodref, ax = pgdsgui(tinf::InferenceTimingNode; consts::Bool=true, by=inclusive)
+methodref     = pgdsgui(ax, tinf::InferenceTimingNode; kwargs...)

Create a scatter plot comparing: - (vertical axis) the inference time for all instances of each Method, as captured by tinf; - (horizontal axis) the run time cost, as estimated by capturing a @profile before calling this function.

Each dot corresponds to a single method. The face color encodes the number of times that method was inferred, and the edge color corresponds to the fraction of the runtime spent on runtime dispatch (black is 0%, bright red is 100%). Clicking on a dot prints the method (or location, if inlined) to the REPL, and sets methodref[] to that method.

ax is the pyplot axis of the scatterplot.

Compat

pgdsgui depends on PyPlot via Julia extensions. You must load both SnoopCompile and PyPlot for this function to be defined.

source

Analysis of invalidations

SnoopCompile.uinvalidatedFunction
umis = uinvalidated(invlist)

Return the unique invalidated MethodInstances. invlist is obtained from SnoopCompileCore.@snoop_invalidations. This is similar to filtering for MethodInstances in invlist, except that it discards any tagged "invalidate_mt_cache". These can typically be ignored because they are nearly inconsequential: they do not invalidate any compiled code, they only transiently affect an optimization of runtime dispatch.

source
SnoopCompile.invalidation_treesFunction
trees = invalidation_trees(list)

Parse list, as captured by SnoopCompileCore.@snoop_invalidations, into a set of invalidation trees, where parents nodes were called by their children.

Example

julia> f(x::Int)  = 1
 f (generic function with 1 method)
 
 julia> f(x::Bool) = 2
@@ -30,14 +30,14 @@
 julia> trees = invalidation_trees(@snoop_invalidations f(::AbstractFloat) = 3)
 1-element Array{SnoopCompile.MethodInvalidations,1}:
  inserting f(::AbstractFloat) in Main at REPL[36]:1 invalidated:
-   mt_backedges: 1: signature Tuple{typeof(f),Any} triggered MethodInstance for applyf(::Array{Any,1}) (1 children) more specific

See the documentation for further details.

source
SnoopCompile.precompile_blockersFunction
staletrees = precompile_blockers(invalidations, tinf::InferenceTimingNode)

Select just those invalidations that contribute to "stale nodes" in tinf, and link them together. This can allow one to identify specific blockers of precompilation for particular MethodInstances.

Example

using SnoopCompileCore
+   mt_backedges: 1: signature Tuple{typeof(f),Any} triggered MethodInstance for applyf(::Array{Any,1}) (1 children) more specific

See the documentation for further details.

source
SnoopCompile.precompile_blockersFunction
staletrees = precompile_blockers(invalidations, tinf::InferenceTimingNode)

Select just those invalidations that contribute to "stale nodes" in tinf, and link them together. This can allow one to identify specific blockers of precompilation for particular MethodInstances.

Example

using SnoopCompileCore
 invalidations = @snoop_invalidations using PkgA, PkgB;
 using SnoopCompile
 trees = invalidation_trees(invalidations)
 tinf = @snoop_inference begin
     some_workload()
 end
-staletrees = precompile_blockers(trees, tinf)

In many cases, this reduces the number of invalidations that require analysis by one or more orders of magnitude.

Info

precompile_blockers is experimental and has not yet been thoroughly vetted by real-world use. Users are encouraged to try it and report any "misses" or unnecessary "hits."

source
SnoopCompile.filtermodFunction
modtrigs = filtermod(mod::Module, mtrigs::AbstractVector{MethodTriggers})

Select just the method-based triggers arising from a particular module.

source
thinned = filtermod(module, trees::AbstractVector{MethodInvalidations}; recursive=false)

Select just the cases of invalidating a method defined in module.

If recursive is false, only the roots of trees are examined (i.e., the proximal source of the invalidation must be in module). If recursive is true, then thinned contains all routes to a method in module.

source
SnoopCompile.findcallerFunction
methinvs = findcaller(method::Method, trees)

Find a path through trees that reaches method. Returns a single MethodInvalidations object.

Examples

Suppose you know that loading package SomePkg triggers invalidation of f(data). You can find the specific source of invalidation as follows:

f(data)                             # run once to force compilation
+staletrees = precompile_blockers(trees, tinf)

In many cases, this reduces the number of invalidations that require analysis by one or more orders of magnitude.

Info

precompile_blockers is experimental and has not yet been thoroughly vetted by real-world use. Users are encouraged to try it and report any "misses" or unnecessary "hits."

source
SnoopCompile.filtermodFunction
modtrigs = filtermod(mod::Module, mtrigs::AbstractVector{MethodTriggers})

Select just the method-based triggers arising from a particular module.

source
thinned = filtermod(module, trees::AbstractVector{MethodInvalidations}; recursive=false)

Select just the cases of invalidating a method defined in module.

If recursive is false, only the roots of trees are examined (i.e., the proximal source of the invalidation must be in module). If recursive is true, then thinned contains all routes to a method in module.

source
SnoopCompile.findcallerFunction
methinvs = findcaller(method::Method, trees)

Find a path through trees that reaches method. Returns a single MethodInvalidations object.

Examples

Suppose you know that loading package SomePkg triggers invalidation of f(data). You can find the specific source of invalidation as follows:

f(data)                             # run once to force compilation
 m = @which f(data)
 using SnoopCompile
 trees = invalidation_trees(@snoop_invalidations using SomePkg)
@@ -54,7 +54,7 @@
 
 julia> findcaller(m, trees)
 inserting ==(x, y::SomeType) in SomeOtherPkg at /path/to/code:100 invalidated:
-   backedges: 1: superseding ==(x, y) in Base at operators.jl:83 with MethodInstance for ==(::Symbol, ::Any) (16 children) more specific
source
SnoopCompile.report_invalidationsFunction
report_invalidations(
+   backedges: 1: superseding ==(x, y) in Base at operators.jl:83 with MethodInstance for ==(::Symbol, ::Any) (16 children) more specific
source
SnoopCompile.report_invalidationsFunction
report_invalidations(
     io::IO = stdout;
     invalidations,
     n_rows::Int = 10,
@@ -68,7 +68,7 @@
 
 using SnoopCompile
 using PrettyTables # to load report_invalidations
-report_invalidations(;invalidations)

Using report_invalidations requires that you first load the PrettyTables.jl package.

source

Analysis of @snoop_inference

SnoopCompile.flattenFunction
flatten(tinf; tmin = 0.0, sortby=exclusive)

Flatten the execution graph of InferenceTimingNodes returned from @snoop_inference into a Vector of InferenceTiming frames, each encoding the time needed for inference of a single MethodInstance. By default, results are sorted by exclusive time (the time for inferring the MethodInstance itself, not including any inference of its callees); other options are sortedby=inclusive which includes the time needed for the callees, or nothing to obtain them in the order they were inferred (depth-first order).

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
+report_invalidations(;invalidations)

Using report_invalidations requires that you first load the PrettyTables.jl package.

source

Analysis of @snoop_inference

SnoopCompile.flattenFunction
flatten(tinf; tmin = 0.0, sortby=exclusive)

Flatten the execution graph of InferenceTimingNodes returned from @snoop_inference into a Vector of InferenceTiming frames, each encoding the time needed for inference of a single MethodInstance. By default, results are sorted by exclusive time (the time for inferring the MethodInstance itself, not including any inference of its callees); other options are sortedby=inclusive which includes the time needed for the callees, or nothing to obtain them in the order they were inferred (depth-first order).

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
 InferenceTimingNode: 0.002148974/0.002767166 on Core.Compiler.Timings.ROOT() with 1 direct children
 
 julia> using AbstractTrees; print_tree(tinf)
@@ -102,7 +102,7 @@
  InferenceTiming: 0.000136496/0.000136496 on SnoopCompile.FlattenDemo.domath(::Int64)
  InferenceTiming: 9.43e-5/0.00035551200000000005 on SnoopCompile.FlattenDemo.dostuff(::SnoopCompile.FlattenDemo.MyType{Int64})
  InferenceTiming: 0.000150891/0.0006117210000000001 on SnoopCompile.FlattenDemo.packintype(::Int64)
- InferenceTiming: 0.002423543/0.0030352639999999998 on Core.Compiler.Timings.ROOT()

As you can see, sortby affects not just the order but also the selection of frames; with exclusive times, dostuff did not on its own rise above threshold, but it does when using inclusive times.

See also: accumulate_by_source.

source
SnoopCompile.accumulate_by_sourceFunction
accumulate_by_source(flattened; tmin = 0.0, by=exclusive)

Add the inference timings for all MethodInstances of a single Method together. flattened is the output of flatten. Returns a list of (t, method) tuples.

When the accumulated time for a Method is large, but each instance is small, it indicates that it is being inferred for many specializations (which might include specializations with different constants).

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
+ InferenceTiming: 0.002423543/0.0030352639999999998 on Core.Compiler.Timings.ROOT()

As you can see, sortby affects not just the order but also the selection of frames; with exclusive times, dostuff did not on its own rise above threshold, but it does when using inclusive times.

See also: accumulate_by_source.

source
SnoopCompile.accumulate_by_sourceFunction
accumulate_by_source(flattened; tmin = 0.0, by=exclusive)

Add the inference timings for all MethodInstances of a single Method together. flattened is the output of flatten. Returns a list of (t, method) tuples.

When the accumulated time for a Method is large, but each instance is small, it indicates that it is being inferred for many specializations (which might include specializations with different constants).

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
 InferenceTimingNode: 0.004978/0.005447 on Core.Compiler.Timings.ROOT() with 1 direct children
 
 julia> accumulate_by_source(flatten(tinf))
@@ -113,15 +113,15 @@
  (8.9997e-5, (var"#ctor-self#"::Type{SnoopCompile.FlattenDemo.MyType{T}} where T)(x) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:35)
  (9.2256e-5, domath(x) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:41)
  (0.000117514, packintype(x) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:37)
- (0.004977755, ROOT() @ Core.Compiler.Timings compiler/typeinfer.jl:79)

Compared to the output from flatten, the two inferences passes on getproperty have been consolidated into a single aggregate call.

source
mtrigs = accumulate_by_source(Method, itrigs::AbstractVector{InferenceTrigger})

Consolidate inference triggers via their caller method. mtrigs is a vector of Method=>list pairs, where list is a list of InferenceTriggers.

source
loctrigs = accumulate_by_source(itrigs::AbstractVector{InferenceTrigger})

Aggregate inference triggers by location (function, file, and line number) of the caller.

Example

We collect data using the SnoopCompile.itrigs_demo:

julia> itrigs = inference_triggers(SnoopCompile.itrigs_demo())
+ (0.004977755, ROOT() @ Core.Compiler.Timings compiler/typeinfer.jl:79)

Compared to the output from flatten, the two inferences passes on getproperty have been consolidated into a single aggregate call.

source
mtrigs = accumulate_by_source(Method, itrigs::AbstractVector{InferenceTrigger})

Consolidate inference triggers via their caller method. mtrigs is a vector of Method=>list pairs, where list is a list of InferenceTriggers.

source
loctrigs = accumulate_by_source(itrigs::AbstractVector{InferenceTrigger})

Aggregate inference triggers by location (function, file, and line number) of the caller.

Example

We collect data using the SnoopCompile.itrigs_demo:

julia> itrigs = inference_triggers(SnoopCompile.itrigs_demo())
 2-element Vector{InferenceTrigger}:
  Inference triggered to call MethodInstance for double(::UInt8) from calldouble1 (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:762) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:763)
  Inference triggered to call MethodInstance for double(::Float64) from calldouble1 (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:762) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:763)
 
 julia> accumulate_by_source(itrigs)
 1-element Vector{SnoopCompile.LocationTriggers}:
-    calldouble1 at /pathto/SnoopCompile/src/parcel_snoop_inference.jl:762 (2 callees from 1 callers)
source
SnoopCompile.collect_forFunction
list = collect_for(m::Method, tinf::InferenceTimingNode)
-list = collect_for(m::MethodInstance, tinf::InferenceTimingNode)

Collect all InferenceTimingNodes (descendants of tinf) that match m.

source
SnoopCompile.staleinstancesFunction
staleinstances(tinf::InferenceTimingNode)

Return a list of InferenceTimingNodes corresponding to MethodInstances that have "stale" code (specifically, CodeInstances with outdated max_world world ages). These may be a hint that invalidation occurred while running the workload provided to @snoop_inference, and consequently an important origin of (re)inference.

Warning

staleinstances only looks retrospectively for stale code; it does not distinguish whether the code became stale while running @snoop_inference from whether it was already stale before execution commenced.

While staleinstances is recommended as a useful "sanity check" to run before performing a detailed analysis of inference, any serious examination of invalidation should use @snoop_invalidations.

For more information about world age, see https://docs.julialang.org/en/v1/manual/methods/#Redefining-Methods.

source
SnoopCompile.inference_triggersFunction
itrigs = inference_triggers(tinf::InferenceTimingNode; exclude_toplevel=true)

Collect the "triggers" of inference, each a fresh entry into inference via a call dispatched at runtime. All the entries in itrigs are previously uninferred, or are freshly-inferred for specific constant inputs.

exclude_toplevel determines whether calls made from the REPL, include, or test suites are excluded.

Example

We'll use SnoopCompile.itrigs_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.itrigs_demo()
+    calldouble1 at /pathto/SnoopCompile/src/parcel_snoop_inference.jl:762 (2 callees from 1 callers)
source
SnoopCompile.collect_forFunction
list = collect_for(m::Method, tinf::InferenceTimingNode)
+list = collect_for(m::MethodInstance, tinf::InferenceTimingNode)

Collect all InferenceTimingNodes (descendants of tinf) that match m.

source
SnoopCompile.staleinstancesFunction
staleinstances(tinf::InferenceTimingNode)

Return a list of InferenceTimingNodes corresponding to MethodInstances that have "stale" code (specifically, CodeInstances with outdated max_world world ages). These may be a hint that invalidation occurred while running the workload provided to @snoop_inference, and consequently an important origin of (re)inference.

Warning

staleinstances only looks retrospectively for stale code; it does not distinguish whether the code became stale while running @snoop_inference from whether it was already stale before execution commenced.

While staleinstances is recommended as a useful "sanity check" to run before performing a detailed analysis of inference, any serious examination of invalidation should use @snoop_invalidations.

For more information about world age, see https://docs.julialang.org/en/v1/manual/methods/#Redefining-Methods.

source
SnoopCompile.inference_triggersFunction
itrigs = inference_triggers(tinf::InferenceTimingNode; exclude_toplevel=true)

Collect the "triggers" of inference, each a fresh entry into inference via a call dispatched at runtime. All the entries in itrigs are previously uninferred, or are freshly-inferred for specific constant inputs.

exclude_toplevel determines whether calls made from the REPL, include, or test suites are excluded.

Example

We'll use SnoopCompile.itrigs_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.itrigs_demo()
 InferenceTimingNode: 0.004490576/0.004711168 on Core.Compiler.Timings.ROOT() with 2 direct children
 
 julia> itrigs = inference_triggers(tinf)
@@ -136,23 +136,23 @@
  >   double(::Float64)
        calldouble1 at /pathto/SnoopCompile/src/inference_demos.jl:86 => calldouble2(::Vector{Vector{Any}}) at /pathto/SnoopCompile/src/inference_demos.jl:87
          calleach(::Vector{Vector{Vector{Any}}}) at /pathto/SnoopCompile/src/inference_demos.jl:88
-...
source
SnoopCompile.trigger_treeFunction
root = trigger_tree(itrigs)

Organize inference triggers itrigs in tree format, grouping items via the call tree.

It is a tree rather than a more general graph due to the fact that caching inference results means that each node gets visited only once.

source
SnoopCompile.suggestFunction
suggest(itrig::InferenceTrigger)

Analyze itrig and attempt to suggest an interpretation or remedy. This returns a structure of type Suggested; the easiest thing to do with the result is to show it; however, you can also filter a list of suggestions.

Example

julia> itrigs = inference_triggers(tinf);
+...
source
SnoopCompile.trigger_treeFunction
root = trigger_tree(itrigs)

Organize inference triggers itrigs in tree format, grouping items via the call tree.

It is a tree rather than a more general graph due to the fact that caching inference results means that each node gets visited only once.

source
SnoopCompile.suggestFunction
suggest(itrig::InferenceTrigger)

Analyze itrig and attempt to suggest an interpretation or remedy. This returns a structure of type Suggested; the easiest thing to do with the result is to show it; however, you can also filter a list of suggestions.

Example

julia> itrigs = inference_triggers(tinf);
 
 julia> sugs = suggest.(itrigs);
 
-julia> sugs_important = filter(!isignorable, sugs)    # discard the ones that probably don't need to be addressed
Warning

Suggestions are approximate at best; most often, the proposed fixes should not be taken literally, but instead taken as a hint about the "outcome" of a particular runtime dispatch incident. The suggestions target calls made with non-inferrable argumets, but often the best place to fix the problem is at an earlier stage in the code, where the argument was first computed.

You can get much deeper insight via ascend (and Cthulhu generally), and even stacktrace is often useful. Suggestions are intended to be a quick and easier-to-comprehend first pass at analyzing an inference trigger.

source
SnoopCompile.callerinstanceFunction
mi = callerinstance(itrig::InferenceTrigger)

Return the MethodInstance mi of the caller in the selected stackframe in itrig.

source
SnoopCompile.callingframeFunction
itrigcaller = callingframe(itrig::InferenceTrigger)

"Step out" one layer of the stacktrace, referencing the caller of the current frame of itrig.

You can retrieve the proximal trigger of inference with InferenceTrigger(itrigcaller).

Example

We collect data using the SnoopCompile.itrigs_demo:

julia> itrig = inference_triggers(SnoopCompile.itrigs_demo())[1]
+julia> sugs_important = filter(!isignorable, sugs)    # discard the ones that probably don't need to be addressed
Warning

Suggestions are approximate at best; most often, the proposed fixes should not be taken literally, but instead taken as a hint about the "outcome" of a particular runtime dispatch incident. The suggestions target calls made with non-inferrable argumets, but often the best place to fix the problem is at an earlier stage in the code, where the argument was first computed.

You can get much deeper insight via ascend (and Cthulhu generally), and even stacktrace is often useful. Suggestions are intended to be a quick and easier-to-comprehend first pass at analyzing an inference trigger.

source
SnoopCompile.callerinstanceFunction
mi = callerinstance(itrig::InferenceTrigger)

Return the MethodInstance mi of the caller in the selected stackframe in itrig.

source
SnoopCompile.callingframeFunction
itrigcaller = callingframe(itrig::InferenceTrigger)

"Step out" one layer of the stacktrace, referencing the caller of the current frame of itrig.

You can retrieve the proximal trigger of inference with InferenceTrigger(itrigcaller).

Example

We collect data using the SnoopCompile.itrigs_demo:

julia> itrig = inference_triggers(SnoopCompile.itrigs_demo())[1]
 Inference triggered to call MethodInstance for double(::UInt8) from calldouble1 (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:762) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:763)
 
 julia> itrigcaller = callingframe(itrig)
-Inference triggered to call MethodInstance for double(::UInt8) from calleach (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:764) with specialization MethodInstance for calleach(::Vector{Vector{Vector{Any}}})
source
SnoopCompile.skiphigherorderFunction
itrignew = skiphigherorder(itrig; exact::Bool=false)

Attempt to skip over frames of higher-order functions that take the callee as a function-argument. This can be useful if you're analyzing inference triggers for an entire package and would prefer to assign triggers to package-code rather than Base functions like map!, broadcast, etc.

Example

We collect data using the SnoopCompile.itrigs_higherorder_demo:

julia> itrig = inference_triggers(SnoopCompile.itrigs_higherorder_demo())[1]
+Inference triggered to call MethodInstance for double(::UInt8) from calleach (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:764) with specialization MethodInstance for calleach(::Vector{Vector{Vector{Any}}})
source
SnoopCompile.skiphigherorderFunction
itrignew = skiphigherorder(itrig; exact::Bool=false)

Attempt to skip over frames of higher-order functions that take the callee as a function-argument. This can be useful if you're analyzing inference triggers for an entire package and would prefer to assign triggers to package-code rather than Base functions like map!, broadcast, etc.

Example

We collect data using the SnoopCompile.itrigs_higherorder_demo:

julia> itrig = inference_triggers(SnoopCompile.itrigs_higherorder_demo())[1]
 Inference triggered to call MethodInstance for double(::Float64) from mymap! (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:706) with specialization MethodInstance for mymap!(::typeof(SnoopCompile.ItrigHigherOrderDemo.double), ::Vector{Any}, ::Vector{Any})
 
 julia> callingframe(itrig)      # step out one (non-inlined) frame
 Inference triggered to call MethodInstance for double(::Float64) from mymap (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:710) with specialization MethodInstance for mymap(::typeof(SnoopCompile.ItrigHigherOrderDemo.double), ::Vector{Any})
 
 julia> skiphigherorder(itrig)   # step out to frame that doesn't have `double` as a function-argument
-Inference triggered to call MethodInstance for double(::Float64) from callmymap (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:711) with specialization MethodInstance for callmymap(::Vector{Any})
Warn

By default skiphigherorder is conservative, and insists on being sure that it's the callee being passed to the higher-order function. Higher-order functions that do not get specialized (e.g., with ::Function argument types) will not be skipped over. You can pass exact=false to allow ::Function to also be passed over, but keep in mind that this may falsely skip some frames.

source
SnoopCompile.InferenceTriggerType
InferenceTrigger(callee::MethodInstance, callerframes::Vector{StackFrame}, btidx::Int, bt)

Organize information about the "triggers" of inference. callee is the MethodInstance requiring inference, callerframes, btidx and bt contain information about the caller. callerframes are the frame(s) of call site that triggered inference; it's a Vector{StackFrame}, rather than a single StackFrame, due to the possibility that the caller was inlined into something else, in which case the first entry is the direct caller and the last entry corresponds to the MethodInstance into which it was ultimately inlined. btidx is the index in bt, the backtrace collected upon entry into inference, corresponding to callerframes.

InferenceTriggers are created by calling inference_triggers. See also: callerinstance and callingframe.

source
SnoopCompile.runtime_inferencetimeFunction
ridata = runtime_inferencetime(tinf::InferenceTimingNode; consts=true, by=inclusive)
-ridata = runtime_inferencetime(tinf::InferenceTimingNode, profiledata; lidict, consts=true, by=inclusive)

Compare runtime and inference-time on a per-method basis. ridata[m::Method] returns (trun, tinfer, nspecializations), measuring the approximate amount of time spent running m, inferring m, and the number of type-specializations, respectively. trun is estimated from profiling data, which the user is responsible for capturing before the call. Typically tinf is collected via @snoop_inference on the first call (in a fresh session) to a workload, and the profiling data collected on a subsequent call. In some cases you may need to repeat the workload several times to collect enough profiling samples.

profiledata and lidict are obtained from Profile.retrieve().

source
SnoopCompile.parcelFunction
ttot, pcs = SnoopCompile.parcel(tinf::InferenceTimingNode)

Parcel the "root-most" precompilable MethodInstances into separate modules. These can be used to generate precompile directives to cache the results of type-inference, reducing latency on first use.

Loosely speaking, and MethodInstance is precompilable if the module that owns the method also has access to all the types it need to precompile the instance. When the root node of an entrance to inference is not itself precompilable, parcel examines the children (and possibly, children's children...) until it finds the first node on each branch that is precompilable. MethodInstances are then assigned to the module that owns the method.

ttot is the total inference time; pcs is a list of module => (tmod, pclist) pairs. For each module, tmod is the amount of inference time affiliated with methods owned by that module; pclist is a list of (t, mi) time/MethodInstance tuples.

See also: SnoopCompile.write.

Example

We'll use SnoopCompile.itrigs_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.itrigs_demo()
+Inference triggered to call MethodInstance for double(::Float64) from callmymap (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:711) with specialization MethodInstance for callmymap(::Vector{Any})
Warn

By default skiphigherorder is conservative, and insists on being sure that it's the callee being passed to the higher-order function. Higher-order functions that do not get specialized (e.g., with ::Function argument types) will not be skipped over. You can pass exact=false to allow ::Function to also be passed over, but keep in mind that this may falsely skip some frames.

source
SnoopCompile.InferenceTriggerType
InferenceTrigger(callee::MethodInstance, callerframes::Vector{StackFrame}, btidx::Int, bt)

Organize information about the "triggers" of inference. callee is the MethodInstance requiring inference, callerframes, btidx and bt contain information about the caller. callerframes are the frame(s) of call site that triggered inference; it's a Vector{StackFrame}, rather than a single StackFrame, due to the possibility that the caller was inlined into something else, in which case the first entry is the direct caller and the last entry corresponds to the MethodInstance into which it was ultimately inlined. btidx is the index in bt, the backtrace collected upon entry into inference, corresponding to callerframes.

InferenceTriggers are created by calling inference_triggers. See also: callerinstance and callingframe.

source
SnoopCompile.runtime_inferencetimeFunction
ridata = runtime_inferencetime(tinf::InferenceTimingNode; consts=true, by=inclusive)
+ridata = runtime_inferencetime(tinf::InferenceTimingNode, profiledata; lidict, consts=true, by=inclusive)

Compare runtime and inference-time on a per-method basis. ridata[m::Method] returns (trun, tinfer, nspecializations), measuring the approximate amount of time spent running m, inferring m, and the number of type-specializations, respectively. trun is estimated from profiling data, which the user is responsible for capturing before the call. Typically tinf is collected via @snoop_inference on the first call (in a fresh session) to a workload, and the profiling data collected on a subsequent call. In some cases you may need to repeat the workload several times to collect enough profiling samples.

profiledata and lidict are obtained from Profile.retrieve().

source
SnoopCompile.parcelFunction
ttot, pcs = SnoopCompile.parcel(tinf::InferenceTimingNode)

Parcel the "root-most" precompilable MethodInstances into separate modules. These can be used to generate precompile directives to cache the results of type-inference, reducing latency on first use.

Loosely speaking, and MethodInstance is precompilable if the module that owns the method also has access to all the types it need to precompile the instance. When the root node of an entrance to inference is not itself precompilable, parcel examines the children (and possibly, children's children...) until it finds the first node on each branch that is precompilable. MethodInstances are then assigned to the module that owns the method.

ttot is the total inference time; pcs is a list of module => (tmod, pclist) pairs. For each module, tmod is the amount of inference time affiliated with methods owned by that module; pclist is a list of (t, mi) time/MethodInstance tuples.

See also: SnoopCompile.write.

Example

We'll use SnoopCompile.itrigs_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.itrigs_demo()
 InferenceTimingNode: 0.004490576/0.004711168 on Core.Compiler.Timings.ROOT() with 2 direct children
 
 julia> ttot, pcs = SnoopCompile.parcel(tinf);
@@ -162,7 +162,7 @@
 
 julia> pcs
 1-element Vector{Pair{Module, Tuple{Float64, Vector{Tuple{Float64, Core.MethodInstance}}}}}:
- SnoopCompile.ItrigDemo => (0.000220592, [(9.8986e-5, MethodInstance for double(::Float64)), (0.000121606, MethodInstance for double(::UInt8))])

Since there was only one module, ttot is the same as tmod. The ItrigDemo module had two precomilable MethodInstances, each listed with its corresponding inclusive time.

source
modtrigs = SnoopCompile.parcel(mtrigs::AbstractVector{MethodTriggers})

Split method-based triggers into collections organized by the module in which the methods were defined. Returns a module => list vector, with the module having the most MethodTriggers last.

source
SnoopCompile.writeFunction
write(prefix::AbstractString, pc::Dict; always::Bool = false)

Write each modules' precompiles to a separate file. If always is true, the generated function will always run the precompile statements when called, otherwise the statements will only be called during package precompilation.

source

Analysis of LLVM

SnoopCompile.read_snoop_llvmFunction
times, info = SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml"; tmin_secs=0.0)

Reads the log file produced by the compiler and returns the structured representations.

The results will only contain modules that took longer than tmin_secs to optimize.

Return value

  • times contains the time spent optimizing each module, as a Pair from the time to an

array of Strings, one for every MethodInstance in that llvm module.

  • info is a Dict containing statistics for each MethodInstance encountered, from before

and after optimization, including number of instructions and number of basicblocks.

Example

julia> @snoop_llvm "func_names.csv" "llvm_timings.yaml" begin
+ SnoopCompile.ItrigDemo => (0.000220592, [(9.8986e-5, MethodInstance for double(::Float64)), (0.000121606, MethodInstance for double(::UInt8))])

Since there was only one module, ttot is the same as tmod. The ItrigDemo module had two precomilable MethodInstances, each listed with its corresponding inclusive time.

source
modtrigs = SnoopCompile.parcel(mtrigs::AbstractVector{MethodTriggers})

Split method-based triggers into collections organized by the module in which the methods were defined. Returns a module => list vector, with the module having the most MethodTriggers last.

source
SnoopCompile.writeFunction
write(prefix::AbstractString, pc::Dict; always::Bool = false)

Write each modules' precompiles to a separate file. If always is true, the generated function will always run the precompile statements when called, otherwise the statements will only be called during package precompilation.

source

Analysis of LLVM

SnoopCompile.read_snoop_llvmFunction
times, info = SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml"; tmin_secs=0.0)

Reads the log file produced by the compiler and returns the structured representations.

The results will only contain modules that took longer than tmin_secs to optimize.

Return value

  • times contains the time spent optimizing each module, as a Pair from the time to an

array of Strings, one for every MethodInstance in that llvm module.

  • info is a Dict containing statistics for each MethodInstance encountered, from before

and after optimization, including number of instructions and number of basicblocks.

Example

julia> @snoop_llvm "func_names.csv" "llvm_timings.yaml" begin
            using InteractiveUtils
            @eval InteractiveUtils.peakflops()
        end
@@ -181,7 +181,7 @@
 Dict{String, NamedTuple{(:before, :after), Tuple{NamedTuple{(:instructions, :basicblocks), Tuple{Int64, Int64}}, NamedTuple{(:instructions, :basicblocks), Tuple{Int64, Int64}}}}} with 3 entries:
   "Tuple{typeof(LinearAlgebra.copy_transpose!), Ar… => (before = (instructions = 651, basicblocks = 83), after = (instructions = 348, basicblocks = 40…
   "Tuple{typeof(Base.copyto!), Array{Float64, 2}, … => (before = (instructions = 617, basicblocks = 77), after = (instructions = 397, basicblocks = 37…
-  "Tuple{typeof(LinearAlgebra._generic_matmatmul!)… => (before = (instructions = 4796, basicblocks = 824), after = (instructions = 1421, basicblocks =…
source

Demos

SnoopCompile.flatten_demoFunction
tinf = SnoopCompile.flatten_demo()

A simple demonstration of @snoop_inference. This demo defines a module

module FlattenDemo
+  "Tuple{typeof(LinearAlgebra._generic_matmatmul!)… => (before = (instructions = 4796, basicblocks = 824), after = (instructions = 1421, basicblocks =…
source

Demos

SnoopCompile.flatten_demoFunction
tinf = SnoopCompile.flatten_demo()

A simple demonstration of @snoop_inference. This demo defines a module

module FlattenDemo
     struct MyType{T} x::T end
     extract(y::MyType) = y.x
     function packintype(x)
@@ -193,14 +193,14 @@
         return y*x + 2*x + 5
     end
     dostuff(y) = domath(extract(y))
-end

It then "warms up" (forces inference on) all of Julia's Base methods needed for domath, to ensure that these MethodInstances do not need to be inferred when we collect the data. It then returns the results of

@snoop_inference FlattenDemo.packintypes(1)

See flatten for an example usage.

source
SnoopCompile.itrigs_demoFunction
tinf = SnoopCompile.itrigs_demo()

A simple demonstration of collecting inference triggers. This demo defines a module

module ItrigDemo
+end

It then "warms up" (forces inference on) all of Julia's Base methods needed for domath, to ensure that these MethodInstances do not need to be inferred when we collect the data. It then returns the results of

@snoop_inference FlattenDemo.packintypes(1)

See flatten for an example usage.

source
SnoopCompile.itrigs_demoFunction
tinf = SnoopCompile.itrigs_demo()

A simple demonstration of collecting inference triggers. This demo defines a module

module ItrigDemo
 @noinline double(x) = 2x
 @inline calldouble1(c) = double(c[1])
 calldouble2(cc) = calldouble1(cc[1])
 calleach(ccs) = (calldouble2(ccs[1]), calldouble2(ccs[2]))
 end

It then "warms up" (forces inference on) calldouble2(::Vector{Vector{Any}}), calldouble1(::Vector{Any}), double(::Int):

cc = [Any[1]]
 ItrigDemo.calleach([cc,cc])

Then it collects and returns inference data using

cc1, cc2 = [Any[0x01]], [Any[1.0]]
-@snoop_inference ItrigDemo.calleach([cc1, cc2])

This does not require any new inference for calldouble2 or calldouble1, but it does force inference on double with two new types. See inference_triggers to see what gets collected and returned.

source
SnoopCompile.itrigs_higherorder_demoFunction
tinf = SnoopCompile.itrigs_higherorder_demo()

A simple demonstration of handling higher-order methods with inference triggers. This demo defines a module

module ItrigHigherOrderDemo
+@snoop_inference ItrigDemo.calleach([cc1, cc2])

This does not require any new inference for calldouble2 or calldouble1, but it does force inference on double with two new types. See inference_triggers to see what gets collected and returned.

source
SnoopCompile.itrigs_higherorder_demoFunction
tinf = SnoopCompile.itrigs_higherorder_demo()

A simple demonstration of handling higher-order methods with inference triggers. This demo defines a module

module ItrigHigherOrderDemo
 double(x) = 2x
 @noinline function mymap!(f, dst, src)
     for i in eachindex(dst, src)
@@ -210,4 +210,4 @@
 end
 @noinline mymap(f::F, src) where F = mymap!(f, Vector{Any}(undef, length(src)), src)
 callmymap(src) = mymap(double, src)
-end

The key feature of this set of definitions is that the function double gets passed as an argument through mymap and mymap! (the latter are higher-order functions).

It then "warms up" (forces inference on) callmymap(::Vector{Any}), mymap(::typeof(double), ::Vector{Any}), mymap!(::typeof(double), ::Vector{Any}, ::Vector{Any}) and double(::Int):

ItrigHigherOrderDemo.callmymap(Any[1, 2])

Then it collects and returns inference data using

@snoop_inference ItrigHigherOrderDemo.callmymap(Any[1.0, 2.0])

which forces inference for double(::Float64).

See skiphigherorder for an example using this demo.

source
+end

The key feature of this set of definitions is that the function double gets passed as an argument through mymap and mymap! (the latter are higher-order functions).

It then "warms up" (forces inference on) callmymap(::Vector{Any}), mymap(::typeof(double), ::Vector{Any}), mymap!(::typeof(double), ::Vector{Any}, ::Vector{Any}) and double(::Int):

ItrigHigherOrderDemo.callmymap(Any[1, 2])

Then it collects and returns inference data using

@snoop_inference ItrigHigherOrderDemo.callmymap(Any[1.0, 2.0])

which forces inference for double(::Float64).

See skiphigherorder for an example using this demo.

source
diff --git a/dev/tutorials/Blackjack/Manifest.toml b/dev/tutorials/Blackjack/Manifest.toml index 109cdce6..194d2c29 100644 --- a/dev/tutorials/Blackjack/Manifest.toml +++ b/dev/tutorials/Blackjack/Manifest.toml @@ -1,6 +1,6 @@ # This file is machine-generated - editing it directly is not advised -julia_version = "1.11.1" +julia_version = "1.11.2" manifest_format = "2.0" project_hash = "589544d5c1b7901218959d36b38dbddc69bae7e1" diff --git a/dev/tutorials/Blackjack/Project.toml b/dev/tutorials/Blackjack/Project.toml index 11af44b2..8ef805d9 100644 --- a/dev/tutorials/Blackjack/Project.toml +++ b/dev/tutorials/Blackjack/Project.toml @@ -1,5 +1,5 @@ name = "Blackjack" -uuid = "495d6eb2-adb7-44b3-9b04-06edd18ef5f7" +uuid = "450e08fe-cd9c-4499-89bc-3fed4667e45e" authors = ["runner "] version = "0.1.0" diff --git a/dev/tutorials/BlackjackFacecards/Manifest.toml b/dev/tutorials/BlackjackFacecards/Manifest.toml index aff1c695..c5f2fe7f 100644 --- a/dev/tutorials/BlackjackFacecards/Manifest.toml +++ b/dev/tutorials/BlackjackFacecards/Manifest.toml @@ -1,13 +1,13 @@ # This file is machine-generated - editing it directly is not advised -julia_version = "1.11.1" +julia_version = "1.11.2" manifest_format = "2.0" -project_hash = "e15908031510810c11414b9ad88781149e1fd1ef" +project_hash = "6fbd77bc28da6031e99faafcb1c4a01449d0f6ea" [[deps.Blackjack]] deps = ["PrecompileTools"] path = "/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/Blackjack" -uuid = "495d6eb2-adb7-44b3-9b04-06edd18ef5f7" +uuid = "450e08fe-cd9c-4499-89bc-3fed4667e45e" version = "0.1.0" [[deps.Dates]] diff --git a/dev/tutorials/BlackjackFacecards/Project.toml b/dev/tutorials/BlackjackFacecards/Project.toml index 15f1bdfb..b6e82873 100644 --- a/dev/tutorials/BlackjackFacecards/Project.toml +++ b/dev/tutorials/BlackjackFacecards/Project.toml @@ -1,7 +1,7 @@ name = "BlackjackFacecards" -uuid = "7483e28d-947b-4029-9dd0-e6b6208f7528" +uuid = "ca994158-d634-4055-ae5d-b84576c8df41" authors = ["runner "] version = "0.1.0" [deps] -Blackjack = "495d6eb2-adb7-44b3-9b04-06edd18ef5f7" +Blackjack = "450e08fe-cd9c-4499-89bc-3fed4667e45e" diff --git a/dev/tutorials/func_names.csv b/dev/tutorials/func_names.csv index a9e3ae74..ac09101b 100644 --- a/dev/tutorials/func_names.csv +++ b/dev/tutorials/func_names.csv @@ -1,19 +1,40 @@ -julia_NamedTuple_533 Tuple{Type{NamedTuple{(:eltype, :ntrials, :parallel), T} where T<:Tuple}, Tuple{DataType, Int64, Bool}} -julia_peakflops_536 Tuple{typeof(InteractiveUtils.peakflops)} -julia_#peakflops#81_546 Tuple{InteractiveUtils.var"##peakflops#81", DataType, Int64, Bool, typeof(InteractiveUtils.peakflops), Int64} -julia_Signed_558 Tuple{Type{Signed}, UInt64} -julia_getproperty_569 Tuple{typeof(Base.getproperty), Base.MappingRF{typeof(Base.identity), typeof(Base.min)}, Symbol} -julia_peakflops_579 Tuple{typeof(Core.kwcall), NamedTuple{(:eltype, :ntrials, :parallel), Tuple{DataType, Int64, Bool}}, typeof(LinearAlgebra.peakflops), Int64} -julia_#peakflops#349_584 Tuple{LinearAlgebra.var"##peakflops#349", DataType, Int64, Bool, typeof(LinearAlgebra.peakflops), Int64} -julia_mapreduce_impl_651 Tuple{typeof(Base.mapreduce_impl), typeof(Base.identity), typeof(Base.min), Array{Float64, 1}, Int64, Int64} -julia_ones_657 Tuple{typeof(Base.ones), Type{Float64}, Int64, Int64} -julia_throw_boundserror_681 Tuple{typeof(Base.throw_boundserror), Array{Float64, 2}, Tuple{Int64}} -julia_*_684 Tuple{typeof(Base.:(*)), Array{Float64, 2}, Array{Float64, 2}} -julia_gemm_wrapper!_700 Tuple{typeof(LinearAlgebra.gemm_wrapper!), Array{Float64, 2}, Char, Char, Array{Float64, 2}, Array{Float64, 2}, LinearAlgebra.MulAddMul{true, true, Bool, Bool}} -julia_wrap_745 Tuple{typeof(LinearAlgebra.wrap), Array{Float64, 2}, Char} -julia_throw_uplo_781 Tuple{typeof(LinearAlgebra.throw_uplo)} -julia_gemm!_784 Tuple{typeof(LinearAlgebra.BLAS.gemm!), Char, Char, Float64, Array{Float64, 2}, Array{Float64, 2}, Float64, Array{Float64, 2}} -julia_matmul3x3!_806 Tuple{typeof(LinearAlgebra.matmul3x3!), Array{Float64, 2}, Char, Char, Array{Float64, 2}, Array{Float64, 2}, LinearAlgebra.MulAddMul{true, true, Bool, Bool}} -julia_throw_boundserror_815 Tuple{typeof(Base.throw_boundserror), Array{Float64, 2}, Tuple{Int64, Int64}} -julia_matmul2x2!_821 Tuple{typeof(LinearAlgebra.matmul2x2!), Array{Float64, 2}, Char, Char, Array{Float64, 2}, Array{Float64, 2}, LinearAlgebra.MulAddMul{true, true, Bool, Bool}} -julia_getindex_824 Tuple{typeof(Base.getindex), Array{Float64, 2}, Int64, Int64} +julia_NamedTuple_537 Tuple{Type{NamedTuple{(:eltype, :ntrials, :parallel), T} where T<:Tuple}, Tuple{DataType, Int64, Bool}} +julia_peakflops_540 Tuple{typeof(InteractiveUtils.peakflops)} +julia_#peakflops#81_550 Tuple{InteractiveUtils.var"##peakflops#81", DataType, Int64, Bool, typeof(InteractiveUtils.peakflops), Int64} +julia_Signed_562 Tuple{Type{Signed}, UInt64} +julia_getproperty_573 Tuple{typeof(Base.getproperty), Base.MappingRF{typeof(Base.identity), typeof(Base.min)}, Symbol} +julia_peakflops_583 Tuple{typeof(Core.kwcall), NamedTuple{(:eltype, :ntrials, :parallel), Tuple{DataType, Int64, Bool}}, typeof(LinearAlgebra.peakflops), Int64} +julia_#peakflops#349_588 Tuple{LinearAlgebra.var"##peakflops#349", DataType, Int64, Bool, typeof(LinearAlgebra.peakflops), Int64} +julia_mapreduce_impl_655 Tuple{typeof(Base.mapreduce_impl), typeof(Base.identity), typeof(Base.min), Array{Float64, 1}, Int64, Int64} +julia_ones_661 Tuple{typeof(Base.ones), Type{Float64}, Int64, Int64} +julia_throw_boundserror_685 Tuple{typeof(Base.throw_boundserror), Array{Float64, 2}, Tuple{Int64}} +julia_in_688 Tuple{typeof(Base.in), Tuple{Char, Char, Char}} +julia_ntuple_692 Tuple{typeof(Base.ntuple), Base.Returns{Bool}, Base.Val{2}} +julia_CartesianIndex_697 Tuple{Type{Base.IteratorsMD.CartesianIndex{N} where N}, Tuple{Int64, Int64}} +julia__chkstride1_700 Tuple{typeof(LinearAlgebra._chkstride1), Bool} +julia_==_704 Tuple{typeof(Base.:(==)), Char, Char} +julia__any_tuple_709 Tuple{typeof(Base._any_tuple), Function, Bool} +julia_<=_713 Tuple{typeof(Base.:(<=)), Char, Char} +julia_in_720 Tuple{typeof(Base.in), Char, Tuple{Char, Char, Char}} +julia_in_730 Tuple{typeof(Base.in), Char, Tuple{Char, Char}} +julia_MulAddMul_740 Tuple{Type{LinearAlgebra.MulAddMul{ais1, bis0, TA, TB} where TB where TA where bis0 where ais1}, Bool, Bool} +julia_getproperty_747 Tuple{typeof(Base.getproperty), LinearAlgebra.MulAddMul{true, true, Bool, Bool}, Symbol} +julia_iszero_752 Tuple{typeof(Base.iszero), Bool} +julia_promote_755 Tuple{typeof(Base.promote), Bool, Bool, Float64} +julia_indexed_iterate_760 Tuple{typeof(Base.indexed_iterate), Tuple{Float64, Float64, Float64}, Int64} +julia_indexed_iterate_767 Tuple{typeof(Base.indexed_iterate), Tuple{Float64, Float64, Float64}, Int64, Int64} +julia_argtail_774 Tuple{typeof(Base.argtail), Function} +julia_DimensionMismatch_776 Tuple{Type{Base.DimensionMismatch}, String} +julia_convert_780 Tuple{typeof(Base.convert), Type{Float64}, Float64} +julia_map_782 Tuple{typeof(Base.map), Base.Fix2{typeof(Base.in), Tuple{Char, Char, Char}}, Tuple{Char, Char}} +julia_all_794 Tuple{typeof(Base.all), Tuple{Bool, Bool}} +julia_*_798 Tuple{typeof(Base.:(*)), Array{Float64, 2}, Array{Float64, 2}} +julia_gemm_wrapper!_814 Tuple{typeof(LinearAlgebra.gemm_wrapper!), Array{Float64, 2}, Char, Char, Array{Float64, 2}, Array{Float64, 2}, LinearAlgebra.MulAddMul{true, true, Bool, Bool}} +julia_wrap_859 Tuple{typeof(LinearAlgebra.wrap), Array{Float64, 2}, Char} +julia_throw_uplo_895 Tuple{typeof(LinearAlgebra.throw_uplo)} +julia_gemm!_898 Tuple{typeof(LinearAlgebra.BLAS.gemm!), Char, Char, Float64, Array{Float64, 2}, Array{Float64, 2}, Float64, Array{Float64, 2}} +julia_matmul3x3!_920 Tuple{typeof(LinearAlgebra.matmul3x3!), Array{Float64, 2}, Char, Char, Array{Float64, 2}, Array{Float64, 2}, LinearAlgebra.MulAddMul{true, true, Bool, Bool}} +julia_throw_boundserror_929 Tuple{typeof(Base.throw_boundserror), Array{Float64, 2}, Tuple{Int64, Int64}} +julia_matmul2x2!_935 Tuple{typeof(LinearAlgebra.matmul2x2!), Array{Float64, 2}, Char, Char, Array{Float64, 2}, Array{Float64, 2}, LinearAlgebra.MulAddMul{true, true, Bool, Bool}} +julia_getindex_938 Tuple{typeof(Base.getindex), Array{Float64, 2}, Int64, Int64} +julia_==_953 Tuple{typeof(Base.:(==)), Float64, Int64} diff --git a/dev/tutorials/invalidations/index.html b/dev/tutorials/invalidations/index.html index 3d3e3b47..e96c7a59 100644 --- a/dev/tutorials/invalidations/index.html +++ b/dev/tutorials/invalidations/index.html @@ -15,14 +15,14 @@ [fa267f1f] + TOML v1.0.3 [4ec0a83e] + Unicode v1.11.0 Precompiling project... - 591.6 ms ✓ Blackjack + 284.0 ms ✓ Blackjack 1 dependency successfully precompiled in 1 seconds. 6 already precompiled.
julia> Pkg.generate("BlackjackFacecards"); Generating project BlackjackFacecards: BlackjackFacecards/Project.toml BlackjackFacecards/src/BlackjackFacecards.jl
julia> Pkg.activate("BlackjackFacecards") Activating project at `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/BlackjackFacecards`
julia> Pkg.develop(PackageSpec(path=joinpath(pwd(), "Blackjack"))); Resolving package versions... Updating `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/BlackjackFacecards/Project.toml` - [495d6eb2] + Blackjack v0.1.0 `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/Blackjack` + [450e08fe] + Blackjack v0.1.0 `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/Blackjack` Updating `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/BlackjackFacecards/Manifest.toml` - [495d6eb2] + Blackjack v0.1.0 `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/Blackjack` + [450e08fe] + Blackjack v0.1.0 `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/Blackjack` [aea7be01] + PrecompileTools v1.2.1 [21216c6a] + Preferences v1.4.3 [ade2ca70] + Dates v1.11.0 @@ -79,10 +79,10 @@ end """)214
Warning

Because BlackjackFacecards "owns" neither Char nor score, this is piracy and should generally be avoided. Piracy is one way to cause invalidations, but it's not the only one. BlackjackFacecards could avoid committing piracy by defining a struct Facecard ... end and defining score(card::Facecard) instead of score(card::Char). However, this would not fix the invalidations–all the factors described below are unchanged.

Now we're ready!

Recording invalidations

Here are the steps executed by the code below

julia> using SnoopCompileCore
julia> invs = @snoop_invalidations using Blackjack, BlackjackFacecards;Precompiling Blackjack... - 647.1 msBlackjack - 1 dependency successfully precompiled in 1 seconds. 6 already precompiled. + 443.3 msBlackjack + 1 dependency successfully precompiled in 0 seconds. 6 already precompiled. Precompiling BlackjackFacecards... - 466.3 msBlackjackFacecards + 311.3 msBlackjackFacecards 1 dependency successfully precompiled in 0 seconds. 7 already precompiled.
julia> using SnoopCompile, AbstractTrees
Tip

If you get errors like Package SnoopCompileCore not found in current path, a likely explanation is that you didn't add it to your default environment. In the example above, we're in the BlackjackFacecards environment so we can develop the package, but you also need access to SnoopCompile and SnoopCompileCore. Having these in your default environment lets them be found even if they aren't part of the current environment.

Analyzing invalidations

Now we're ready to see what, if anything, got invalidated:

julia> trees = invalidation_trees(invs)1-element Vector{SnoopCompile.MethodInvalidations}:
  inserting score(card::Char) @ BlackjackFacecards ~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/BlackjackFacecards/src/BlackjackFacecards.jl:6 invalidated:
    mt_backedges: 1: signature Tuple{typeof(Blackjack.score), Any} triggered MethodInstance for Blackjack.tallyscores(::Vector{Any}) (1 children)

This has only one "tree" of invalidations. trees is a Vector so we can index it:

julia> tree = trees[1]inserting score(card::Char) @ BlackjackFacecards ~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/BlackjackFacecards/src/BlackjackFacecards.jl:6 invalidated:
@@ -98,4 +98,4 @@
         s += invokelatest(score, card)
     end
     return s
-end

This forces Julia to always look up the appropriate method of score while the code is running, and thus prevents the speculative optimizations that leave the code vulnerable to invalidation. However, the cost is that your code may run somewhat more slowly, particularly here where the call is inside a loop.

If you plan to define at least two score methods, another way to turn off this optimization would be to declare

Base.Experimental.@max_methods 1 function score end

before defining any score methods. You can read the documentation on @max_methods to learn more about how it works.

Tip

Most of us learn best by doing. Try at least one of these methods of fixing the invalidation, and use SnoopCompile to verify that it works.

Undoing the damage from invalidations

If you can't prevent the invalidation, an alternative approach is to recompile the invalidated code. For example, one could repeat the precompile workload from Blackjack in BlackjackFacecards. While this will mean that the whole "stack" will be compiled twice and cached twice (which is wasteful), it should be effective in reducing latency for users.

PrecompileTools also has a @recompile_invalidations. This isn't generally recommended for use in package (you can end up with long compile times for things you don't need), but it can be useful in personal "Startup packages" where you want to reduce latency for a particular project you're working on. See the PrecompileTools documentation for details.

  Activating project at `~/work/SnoopCompile.jl/SnoopCompile.jl/docs`
+end

This forces Julia to always look up the appropriate method of score while the code is running, and thus prevents the speculative optimizations that leave the code vulnerable to invalidation. However, the cost is that your code may run somewhat more slowly, particularly here where the call is inside a loop.

If you plan to define at least two score methods, another way to turn off this optimization would be to declare

Base.Experimental.@max_methods 1 function score end

before defining any score methods. You can read the documentation on @max_methods to learn more about how it works.

Tip

Most of us learn best by doing. Try at least one of these methods of fixing the invalidation, and use SnoopCompile to verify that it works.

Undoing the damage from invalidations

If you can't prevent the invalidation, an alternative approach is to recompile the invalidated code. For example, one could repeat the precompile workload from Blackjack in BlackjackFacecards. While this will mean that the whole "stack" will be compiled twice and cached twice (which is wasteful), it should be effective in reducing latency for users.

PrecompileTools also has a @recompile_invalidations. This isn't generally recommended for use in package (you can end up with long compile times for things you don't need), but it can be useful in personal "Startup packages" where you want to reduce latency for a particular project you're working on. See the PrecompileTools documentation for details.

  Activating project at `~/work/SnoopCompile.jl/SnoopCompile.jl/docs`
diff --git a/dev/tutorials/jet/index.html b/dev/tutorials/jet/index.html index 66c4c32f..5d22117b 100644 --- a/dev/tutorials/jet/index.html +++ b/dev/tutorials/jet/index.html @@ -78,4 +78,4 @@ │││││││││││││││┌ reduce_empty(::typeof(+), ::Type{Any}) @ Base ./reduce.jl:343 ││││││││││││││││┌ zero(::Type{Any}) @ Base ./missing.jl:106 │││││││││││││││││ MethodError: no method matching zero(::Type{Any}): Base.throw(Base.MethodError(zero, tuple(Base.Any)::Tuple{DataType})::MethodError) -││││││││││││││││└────────────────────

Because SnoopCompileCore collected the runtime-dispatched sum call, we can pass it to JET. report_callees filters those calls which generate JET reports, allowing you to focus on potential errors.

Note

JET integration is enabled only if JET.jl and Cthulhu.jl have been loaded into your main session. This is why there's the using JET, Cthulhu statement included in the example given.

+││││││││││││││││└────────────────────

Because SnoopCompileCore collected the runtime-dispatched sum call, we can pass it to JET. report_callees filters those calls which generate JET reports, allowing you to focus on potential errors.

Note

JET integration is enabled only if JET.jl and Cthulhu.jl have been loaded into your main session. This is why there's the using JET, Cthulhu statement included in the example given.

diff --git a/dev/tutorials/llvm_timings.yaml b/dev/tutorials/llvm_timings.yaml index 44047aff..c532f659 100644 --- a/dev/tutorials/llvm_timings.yaml +++ b/dev/tutorials/llvm_timings.yaml @@ -1,219 +1,450 @@ - before: - "julia_NamedTuple_533": + "julia_NamedTuple_537": instructions: 25 basicblocks: 1 - time_ns: 1063097 + time_ns: 1021783 optlevel: 2 after: - "julia_NamedTuple_533": + "julia_NamedTuple_537": instructions: 18 basicblocks: 1 - before: - "julia_peakflops_536": + "julia_peakflops_540": instructions: 31 basicblocks: 1 - time_ns: 835562 + time_ns: 748781 optlevel: 1 after: - "julia_peakflops_536": + "julia_peakflops_540": instructions: 20 basicblocks: 1 - before: - "julia_#peakflops#81_546": + "julia_#peakflops#81_550": instructions: 62 basicblocks: 1 - time_ns: 828489 + time_ns: 678700 optlevel: 1 after: - "julia_#peakflops#81_546": + "julia_#peakflops#81_550": instructions: 63 basicblocks: 1 - before: - "julia_Signed_558": + "julia_Signed_562": instructions: 47 basicblocks: 7 - time_ns: 1189373 + time_ns: 977288 optlevel: 2 after: - "julia_Signed_558": + "julia_Signed_562": instructions: 34 basicblocks: 3 - before: - "julia_getproperty_569": + "julia_getproperty_573": instructions: 45 basicblocks: 3 - time_ns: 2252851 + time_ns: 1036890 optlevel: 2 after: - "julia_getproperty_569": + "julia_getproperty_573": instructions: 26 basicblocks: 3 - before: - time_ns: 252883 + time_ns: 88586 optlevel: 2 after: - before: - "julia_mapreduce_impl_651": + "julia_mapreduce_impl_655": instructions: 1043 basicblocks: 124 - time_ns: 10682147 + time_ns: 9202329 optlevel: 2 after: - "julia_mapreduce_impl_651": + "julia_mapreduce_impl_655": instructions: 223 basicblocks: 17 - before: - "julia_#peakflops#349_584": + "julia_#peakflops#349_588": instructions: 830 basicblocks: 100 - time_ns: 9584756 + time_ns: 8958813 optlevel: 2 after: - "julia_#peakflops#349_584": + "julia_#peakflops#349_588": instructions: 305 basicblocks: 29 - before: - "julia_peakflops_579": + "julia_peakflops_583": instructions: 27 basicblocks: 1 - time_ns: 767144 + time_ns: 583281 optlevel: 2 after: - "julia_peakflops_579": + "julia_peakflops_583": instructions: 14 basicblocks: 1 - before: - "julia_throw_boundserror_681": + "julia_throw_boundserror_685": instructions: 27 basicblocks: 3 - time_ns: 679120 + time_ns: 542495 optlevel: 2 after: - "julia_throw_boundserror_681": + "julia_throw_boundserror_685": instructions: 10 basicblocks: 1 - before: - "julia_ones_657": + "julia_ones_661": instructions: 265 basicblocks: 32 - time_ns: 4518516 + time_ns: 4123686 optlevel: 2 after: - "julia_ones_657": + "julia_ones_661": instructions: 94 basicblocks: 14 - before: - time_ns: 144329 + "julia_in_688": + instructions: 18 + basicblocks: 1 + time_ns: 1061897 + optlevel: 2 + after: + "julia_in_688": + instructions: 9 + basicblocks: 1 +- + before: + "julia_ntuple_692": + instructions: 26 + basicblocks: 1 + time_ns: 726019 + optlevel: 2 + after: + "julia_ntuple_692": + instructions: 11 + basicblocks: 1 +- + before: + "julia_CartesianIndex_697": + instructions: 16 + basicblocks: 1 + time_ns: 735355 + optlevel: 2 + after: + "julia_CartesianIndex_697": + instructions: 9 + basicblocks: 1 +- + before: + "julia__chkstride1_700": + instructions: 24 + basicblocks: 4 + time_ns: 918399 + optlevel: 2 + after: + "julia__chkstride1_700": + instructions: 13 + basicblocks: 3 +- + before: + "julia_==_704": + instructions: 24 + basicblocks: 1 + time_ns: 540191 + optlevel: 2 + after: + "julia_==_704": + instructions: 10 + basicblocks: 1 +- + before: + "julia__any_tuple_709": + instructions: 22 + basicblocks: 3 + time_ns: 788345 + optlevel: 2 + after: + "julia__any_tuple_709": + instructions: 14 + basicblocks: 1 +- + before: + "julia_<=_713": + instructions: 38 + basicblocks: 1 + time_ns: 858807 + optlevel: 2 + after: + "julia_<=_713": + instructions: 10 + basicblocks: 1 +- + before: + "julia_in_720": + instructions: 90 + basicblocks: 25 + time_ns: 1985666 + optlevel: 2 + after: + "julia_in_720": + instructions: 21 + basicblocks: 3 +- + before: + "julia_in_730": + instructions: 90 + basicblocks: 25 + time_ns: 1662070 + optlevel: 2 + after: + "julia_in_730": + instructions: 15 + basicblocks: 1 +- + before: + "julia_MulAddMul_740": + instructions: 78 + basicblocks: 7 + time_ns: 1438662 + optlevel: 2 + after: + "julia_MulAddMul_740": + instructions: 45 + basicblocks: 8 +- + before: + "julia_getproperty_747": + instructions: 33 + basicblocks: 3 + time_ns: 759130 + optlevel: 2 + after: + "julia_getproperty_747": + instructions: 16 + basicblocks: 3 +- + before: + "julia_iszero_752": + instructions: 17 + basicblocks: 1 + time_ns: 585516 + optlevel: 2 + after: + "julia_iszero_752": + instructions: 10 + basicblocks: 1 +- + before: + "julia_promote_755": + instructions: 30 + basicblocks: 1 + time_ns: 947343 + optlevel: 2 + after: + "julia_promote_755": + instructions: 19 + basicblocks: 1 +- + before: + "julia_indexed_iterate_760": + instructions: 32 + basicblocks: 3 + time_ns: 951591 + optlevel: 2 + after: + "julia_indexed_iterate_760": + instructions: 19 + basicblocks: 3 +- + before: + "julia_indexed_iterate_767": + instructions: 32 + basicblocks: 3 + time_ns: 952432 + optlevel: 2 + after: + "julia_indexed_iterate_767": + instructions: 19 + basicblocks: 3 +- + before: + "julia_argtail_774": + instructions: 12 + basicblocks: 1 + time_ns: 577850 + optlevel: 2 + after: + "julia_argtail_774": + instructions: 8 + basicblocks: 1 +- + before: + "julia_DimensionMismatch_776": + instructions: 19 + basicblocks: 1 + time_ns: 749492 optlevel: 2 after: + "julia_DimensionMismatch_776": + instructions: 9 + basicblocks: 1 - before: - "julia_throw_boundserror_815": + "julia_convert_780": + instructions: 12 + basicblocks: 1 + time_ns: 678439 + optlevel: 2 + after: + "julia_convert_780": + instructions: 8 + basicblocks: 1 +- + before: + "julia_map_782": + instructions: 196 + basicblocks: 53 + time_ns: 3607039 + optlevel: 2 + after: + "julia_map_782": + instructions: 38 + basicblocks: 5 +- + before: + "julia_all_794": + instructions: 26 + basicblocks: 1 + time_ns: 745224 + optlevel: 2 + after: + "julia_all_794": + instructions: 15 + basicblocks: 1 +- + before: + time_ns: 142687 + optlevel: 2 + after: +- + before: + "julia_throw_boundserror_929": instructions: 38 basicblocks: 3 - time_ns: 1039242 + time_ns: 1026440 optlevel: 2 after: - "julia_throw_boundserror_815": + "julia_throw_boundserror_929": instructions: 28 basicblocks: 1 - before: - "julia_matmul2x2!_821": + "julia_matmul2x2!_935": instructions: 10425 basicblocks: 1122 - time_ns: 11073849 + time_ns: 11115598 optlevel: 2 after: - "julia_matmul2x2!_821": + "julia_matmul2x2!_935": instructions: 177 basicblocks: 20 - before: - "julia_matmul3x3!_806": + "julia_matmul3x3!_920": instructions: 22969 basicblocks: 2452 - time_ns: 19759084 + time_ns: 19761345 optlevel: 2 after: - "julia_matmul3x3!_806": + "julia_matmul3x3!_920": instructions: 300 basicblocks: 20 - before: - "julia_gemm!_784": + "julia_gemm!_898": instructions: 1069 basicblocks: 148 - time_ns: 6166727 + time_ns: 5968619 optlevel: 2 after: - "julia_gemm!_784": + "julia_gemm!_898": instructions: 199 basicblocks: 12 - before: - "julia_throw_uplo_781": + "julia_throw_uplo_895": instructions: 32 basicblocks: 3 - time_ns: 807199 + time_ns: 798534 optlevel: 2 after: - "julia_throw_uplo_781": + "julia_throw_uplo_895": instructions: 25 basicblocks: 1 - before: - "julia_wrap_745": + "julia_wrap_859": instructions: 1322 basicblocks: 99 - time_ns: 10924570 + time_ns: 10718085 optlevel: 2 after: - "julia_wrap_745": + "julia_wrap_859": instructions: 283 basicblocks: 35 - before: - "julia_gemm_wrapper!_700": + "julia_gemm_wrapper!_814": instructions: 1109 basicblocks: 148 - time_ns: 8441569 + time_ns: 8077203 optlevel: 2 after: - "julia_gemm_wrapper!_700": + "julia_gemm_wrapper!_814": instructions: 187 basicblocks: 15 - before: - "julia_*_684": + "julia_*_798": instructions: 151 basicblocks: 15 - time_ns: 2171879 + time_ns: 2097975 optlevel: 2 after: - "julia_*_684": + "julia_*_798": instructions: 68 basicblocks: 7 - before: - "julia_getindex_824": + "julia_getindex_938": instructions: 189 basicblocks: 18 - time_ns: 1880475 + time_ns: 1940591 optlevel: 2 after: - "julia_getindex_824": + "julia_getindex_938": instructions: 29 basicblocks: 3 +- + before: + "julia_==_953": + instructions: 35 + basicblocks: 1 + time_ns: 725177 + optlevel: 2 + after: + "julia_==_953": + instructions: 17 + basicblocks: 1 diff --git a/dev/tutorials/pgdsgui/index.html b/dev/tutorials/pgdsgui/index.html index 3418064c..86de41c0 100644 --- a/dev/tutorials/pgdsgui/index.html +++ b/dev/tutorials/pgdsgui/index.html @@ -83,4 +83,4 @@ MethodInstance for save(::String, ::Vector{SomePkg.SomeDataType{SubString{String}}}) MethodInstance for save(::SubString{String}, ::Array) MethodInstance for save(::String, ::Vector{var"#s92"} where var"#s92"<:SomePkg.SomeDataType) - MethodInstance for save(::String, ::Array)

In this case we have 7 MethodInstances (some of which are clearly due to poor inferrability of the caller) when one might suffice.

+ MethodInstance for save(::String, ::Array)

In this case we have 7 MethodInstances (some of which are clearly due to poor inferrability of the caller) when one might suffice.

diff --git a/dev/tutorials/snoop_inference/index.html b/dev/tutorials/snoop_inference/index.html index 3673f9f1..e6cbf0a3 100644 --- a/dev/tutorials/snoop_inference/index.html +++ b/dev/tutorials/snoop_inference/index.html @@ -45,4 +45,4 @@ MethodInstance for FlattenDemo.packintype(::Int64)

Each node in this tree is accompanied by a pair of numbers. The first number is the exclusive inference time (in seconds), meaning the time spent inferring the particular MethodInstance, not including the time spent inferring its callees. The second number is the inclusive time, which is the exclusive time plus the time spent on the callees. Therefore, the inclusive time is always at least as large as the exclusive time.

The ROOT node is a bit different: its exclusive time measures the time spent on all operations except inference. In this case, we see that the entire call took approximately 3.3ms, of which 2.7ms was spent on activities besides inference. Almost all of that was code-generation, but it also includes the time needed to run the code. Just 0.55ms was needed to run type-inference on this entire series of calls. As you will quickly discover, inference takes much more time on more complicated code.

We can also display this tree as a flame graph, using the ProfileView.jl package:

julia> fg = flamegraph(tinf)
 Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:75, 0x00, 0:10080857))
julia> using ProfileView
 
-julia> ProfileView.view(fg)

You should see something like this:

flamegraph

Users are encouraged to read the ProfileView documentation to understand how to interpret this, but briefly:

You can explore this flamegraph and compare it to the output from print_tree.

Note

Orange-yellow boxes that appear at the base of a flame are worth special attention, and may represent something that you thought you had precompiled. For example, suppose your workload "exercises" myfun(args...; warn=true), so you might think you have myfun covered for the corresponding argument types. But constant-propagation (as indicated by the orange-yellow coloration) results in (re)compilation for specific values: if Julia has decided that myfun merits constant-propagation, a call myfun(args...; warn=false) might need to be compiled separately.

When you want to prevent constant-propagation from hurting your TTFX, you have two options:

Finally, flatten, on its own or together with accumulate_by_source, allows you to get an sense for the cost of individual MethodInstances or Methods.

The tools here allow you to get an overview of where inference is spending its time. This gives you insight into the major contributors to latency.

+julia> ProfileView.view(fg)

You should see something like this:

flamegraph

Users are encouraged to read the ProfileView documentation to understand how to interpret this, but briefly:

You can explore this flamegraph and compare it to the output from print_tree.

Note

Orange-yellow boxes that appear at the base of a flame are worth special attention, and may represent something that you thought you had precompiled. For example, suppose your workload "exercises" myfun(args...; warn=true), so you might think you have myfun covered for the corresponding argument types. But constant-propagation (as indicated by the orange-yellow coloration) results in (re)compilation for specific values: if Julia has decided that myfun merits constant-propagation, a call myfun(args...; warn=false) might need to be compiled separately.

When you want to prevent constant-propagation from hurting your TTFX, you have two options:

Finally, flatten, on its own or together with accumulate_by_source, allows you to get an sense for the cost of individual MethodInstances or Methods.

The tools here allow you to get an overview of where inference is spending its time. This gives you insight into the major contributors to latency.

diff --git a/dev/tutorials/snoop_inference_analysis/index.html b/dev/tutorials/snoop_inference_analysis/index.html index 6272dccd..c3ea624f 100644 --- a/dev/tutorials/snoop_inference_analysis/index.html +++ b/dev/tutorials/snoop_inference_analysis/index.html @@ -1,5 +1,5 @@ -Using @snoop_inference results to improve inferrability · SnoopCompile

Using @snoop_inference results to improve inferrability

Throughout this page, we'll use the OptimizeMe demo, which ships with SnoopCompile.

Note

To understand what follows, it's essential to refer to OptimizeMe source code as you follow along.

julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();lotsa containers:
julia> fg = flamegraph(tinf)Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:79, 0x00, 0:1401678903))

If you visualize fg with ProfileView, you may see something like this:

flamegraph-OptimizeMe

From the standpoint of precompilation, this has some obvious problems:

  • even though we called a single method, OptimizeMe.main(), there are many distinct flames separated by blank spaces. This indicates that many calls are being made by runtime dispatch: each separate flame is a fresh entrance into inference.
  • several of the flames are marked in red, indicating that they are not naively precompilable (see the Tutorial on @snoop_inference). While @compile_workload can handle these flames, an even more robust solution is to eliminate them altogether.

Our goal will be to improve the design of OptimizeMe to make it more readily precompilable.

Analyzing inference triggers

We'll first extract the "triggers" of inference, which is just a repackaging of part of the information contained within tinf. Specifically an InferenceTrigger captures callee/caller relationships that straddle a fresh entrance to type-inference, allowing you to identify which calls were made by runtime dispatch and what MethodInstance they called.

julia> itrigs = inference_triggers(tinf)37-element Vector{InferenceTrigger}:
+Using @snoop_inference results to improve inferrability · SnoopCompile

Using @snoop_inference results to improve inferrability

Throughout this page, we'll use the OptimizeMe demo, which ships with SnoopCompile.

Note

To understand what follows, it's essential to refer to OptimizeMe source code as you follow along.

julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();lotsa containers:
julia> fg = flamegraph(tinf)Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:79, 0x00, 0:1364572884))

If you visualize fg with ProfileView, you may see something like this:

flamegraph-OptimizeMe

From the standpoint of precompilation, this has some obvious problems:

  • even though we called a single method, OptimizeMe.main(), there are many distinct flames separated by blank spaces. This indicates that many calls are being made by runtime dispatch: each separate flame is a fresh entrance into inference.
  • several of the flames are marked in red, indicating that they are not naively precompilable (see the Tutorial on @snoop_inference). While @compile_workload can handle these flames, an even more robust solution is to eliminate them altogether.

Our goal will be to improve the design of OptimizeMe to make it more readily precompilable.

Analyzing inference triggers

We'll first extract the "triggers" of inference, which is just a repackaging of part of the information contained within tinf. Specifically an InferenceTrigger captures callee/caller relationships that straddle a fresh entrance to type-inference, allowing you to identify which calls were made by runtime dispatch and what MethodInstance they called.

julia> itrigs = inference_triggers(tinf)37-element Vector{InferenceTrigger}:
  Inference triggered to call Main.var"Main".OptimizeMe.main() from eval (./boot.jl:430) inlined into cd(::Documenter.var"#64#66"{Module}, ::String) (./file.jl:112)
  Inference triggered to call similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container{Int64}}) from copy (./broadcast.jl:907) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15)
  Inference triggered to call setindex!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Main.var"Main".OptimizeMe.Container{Int64}, ::Int64) from copy (./broadcast.jl:908) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15)
@@ -19,18 +19,18 @@
  Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Vector{Int64}}) from #sprint#592 (./strings/io.jl:112) with specialization Base.var"#sprint#592"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any})
  Inference triggered to call show(::IOContext{IOBuffer}, ::String, ::Main.var"Main".OptimizeMe.Container{Tuple{String, Int64}}) from #sprint#592 (./strings/io.jl:112) with specialization Base.var"#sprint#592"(::IOContext{Base.PipeEndpoint}, ::Int64, ::typeof(sprint), ::Function, ::String, ::Vararg{Any})
  Inference triggered to call Main.var"Main".OptimizeMe.howbig(::Float64) from #1 (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29) with specialization (::Main.var"Main".OptimizeMe.var"#1#2")(::Float64)
- Inference triggered to call Base.collect_to_with_first!(::Vector{Float64}, ::Float64, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Int64) from _collect (./array.jl:810) with specialization Base._collect(::Vector{Float64}, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Base.EltypeUnknown, ::Base.HasShape{1})

The number of elements in this Vector{InferenceTrigger} tells you how many calls were (1) made by runtime dispatch and (2) the callee had not previously been inferred.

Tip

In the REPL, SnoopCompile displays InferenceTriggers with yellow coloration for the callee, red for the caller method, and blue for the caller specialization. This makes it easier to quickly identify the most important information.

In some cases, this might indicate that you'll need to fix each case separately; fortunately, in many cases fixing one problem addresses many other.

Method triggers

Most often, it's most convenient to organize them by the method triggering the need for inference:

julia> mtrigs = accumulate_by_source(Method, itrigs)11-element Vector{SnoopCompile.TaggedTriggers{Method}}:
+ Inference triggered to call Base.collect_to_with_first!(::Vector{Float64}, ::Float64, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Int64) from _collect (./array.jl:821) with specialization Base._collect(::Vector{Float64}, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Base.EltypeUnknown, ::Base.HasShape{1})

The number of elements in this Vector{InferenceTrigger} tells you how many calls were (1) made by runtime dispatch and (2) the callee had not previously been inferred.

Tip

In the REPL, SnoopCompile displays InferenceTriggers with yellow coloration for the callee, red for the caller method, and blue for the caller specialization. This makes it easier to quickly identify the most important information.

In some cases, this might indicate that you'll need to fix each case separately; fortunately, in many cases fixing one problem addresses many other.

Method triggers

Most often, it's most convenient to organize them by the method triggering the need for inference:

julia> mtrigs = accumulate_by_source(Method, itrigs)11-element Vector{SnoopCompile.TaggedTriggers{Method}}:
+ _collect(c, itr, ::Base.EltypeUnknown, isz::Union{Base.HasLength, Base.HasShape}) @ Base array.jl:808 (1 callees from 1 callers)
+ display(d::TextDisplay, M::MIME{Symbol("text/plain")}, x) @ Base.Multimedia multimedia.jl:254 (1 callees from 1 callers)
+ print_matrix_row(io::IO, X::AbstractVecOrMat, A::Vector, i::Integer, cols::AbstractVector, sep::AbstractString, idxlast::Integer) @ Base arrayshow.jl:97 (1 callees from 1 callers)
  cd(f::Function, dir::AbstractString) @ Base.Filesystem file.jl:107 (1 callees from 1 callers)
+ typeinfo_prefix(io::IO, X) @ Base arrayshow.jl:568 (1 callees from 1 callers)
  (::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers)
- print_matrix_row(io::IO, X::AbstractVecOrMat, A::Vector, i::Integer, cols::AbstractVector, sep::AbstractString, idxlast::Integer) @ Base arrayshow.jl:97 (1 callees from 1 callers)
- display(d::TextDisplay, M::MIME{Symbol("text/plain")}, x) @ Base.Multimedia multimedia.jl:254 (1 callees from 1 callers)
- typeinfo_prefix(io::IO, X) @ Base arrayshow.jl:562 (1 callees from 1 callers)
- _collect(c, itr, ::Base.EltypeUnknown, isz::Union{Base.HasLength, Base.HasShape}) @ Base array.jl:797 (1 callees from 1 callers)
  copyto_nonleaf!(dest, bc::Base.Broadcast.Broadcasted, iter, state, count) @ Base.Broadcast broadcast.jl:1071 (2 callees from 1 callers)
  lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 (3 callees from 1 callers)
  var"#sprint#592"(context, sizehint::Integer, ::typeof(sprint), f::Function, args...) @ Base strings/io.jl:107 (8 callees from 2 callers)
  alignment(io::IO, X::AbstractVecOrMat, rows::AbstractVector{T}, cols::AbstractVector{V}, cols_if_complete::Integer, cols_otherwise::Integer, sep::Integer, ncols::Integer) where {T, V} @ Base arrayshow.jl:60 (9 callees from 1 callers)
- _show_default(io::IO, x) @ Base show.jl:481 (9 callees from 1 callers)

The methods triggering the largest number of inference runs are shown at the bottom. You can also select methods from a particular module:

julia> modtrigs = filtermod(OptimizeMe, mtrigs)2-element Vector{SnoopCompile.TaggedTriggers{Method}}:
+ _show_default(io::IO, x) @ Base show.jl:484 (9 callees from 1 callers)

The methods triggering the largest number of inference runs are shown at the bottom. You can also select methods from a particular module:

julia> modtrigs = filtermod(OptimizeMe, mtrigs)2-element Vector{SnoopCompile.TaggedTriggers{Method}}:
  (::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers)
  lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 (3 callees from 1 callers)

Rather than filter by a single module, you can alternatively call SnoopCompile.parcel(mtrigs) to split them out by module. In this case, most of the triggers came from Base, not OptimizeMe. However, many of the failures in Base were nevertheless indirectly due to OptimizeMe: our methods in OptimizeMe call Base methods with arguments that trigger internal inference failures. Fortunately, we'll see that using more careful design in OptimizeMe can avoid many of those problems.

Tip

If you have a longer list of inference triggers than you feel comfortable tackling, filtering by your package's module or using precompile_blockers can be a good way to start. Fixing issues in the package itself can end up resolving many of the "indirect" triggers too. Also be sure to note the ability to filter out likely "noise" from test suites.

You can get an overview of each Method trigger with summary:

julia> mtrig = modtrigs[1](::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers)
julia> summary(mtrig)(::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 had 1 specializations MethodInstance for (::Main.var"Main".OptimizeMe.var"#1#2")(::Float64) has Core.Box (fix this before tackling other problems, see https://timholy.github.io/SnoopCompile.jl/stable/snoop_invalidations/#Fixing-Core.Box) @@ -110,4 +110,4 @@ 222 julia> length(itrigsel) -71

While there is some risk of discarding triggers that provide clues about the origin of other triggers (e.g., they would have shown up in the same branch of the trigger_tree), the shorter list may help direct your attention to the "real" issues.

+71

While there is some risk of discarding triggers that provide clues about the origin of other triggers (e.g., they would have shown up in the same branch of the trigger_tree), the shorter list may help direct your attention to the "real" issues.

diff --git a/dev/tutorials/snoop_inference_parcel/index.html b/dev/tutorials/snoop_inference_parcel/index.html index cdc31461..aa67d7df 100644 --- a/dev/tutorials/snoop_inference_parcel/index.html +++ b/dev/tutorials/snoop_inference_parcel/index.html @@ -1,14 +1,14 @@ -Using @snoop_inference to emit manual precompile directives · SnoopCompile

Using @snoop_inference to emit manual precompile directives

In a few cases, it may be inconvenient or impossible to precompile using a workload. Some examples might be:

  • an application that opens graphical windows
  • an application that connects to a database
  • an application that creates, deletes, or rewrites files on disk

In such cases, one alternative is to create a manual list of precompile directives using Julia's precompile(f, argtypes) function.

Warning

Manual precompile directives are much more likely to "go stale" as the package is developed–-precompile does not throw an error if a method for the given argtypes cannot be found. They are also more likely to be dependent on the Julia version, operating system, or CPU architecture. Whenever possible, it's safer to use a workload.

precompile directives have to be emitted by the module that owns the method and/or types. SnoopCompile comes with a tool, parcel, that splits out the "root-most" precompilable MethodInstances into their constituent modules. This will typically correspond to the bottom row of boxes in the flame graph. In cases where you have some that are not naively precompilable, they will include MethodInstances from higher up in the call tree.

Let's use SnoopCompile.parcel on our OptimizeMe demo:

julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();lotsa containers:
julia> ttot, pcs = SnoopCompile.parcel(tinf);
julia> ttot0.06976970299999999
julia> pcs4-element Vector{Pair{Module, Tuple{Float64, Vector{Tuple{Float64, Core.MethodInstance}}}}}: - Core => (1.914e-6, [(1.914e-6, MethodInstance for (NamedTuple{(:sizehint,)})(::Tuple{Int64}))]) - Base.Multimedia => (7.053e-6, [(7.053e-6, MethodInstance for MIME(::String))]) - Base => (0.0034157989999999993, [(1.523e-6, MethodInstance for IOContext(::IOBuffer, ::IOContext{Base.PipeEndpoint})), (1.633e-6, MethodInstance for LinearIndices(::Tuple{Base.OneTo{Int64}})), (3.937e-6, MethodInstance for IOContext(::IOContext{Base.PipeEndpoint}, ::Base.ImmutableDict{Symbol, Any})), (6.182e-6, MethodInstance for Base.indexed_iterate(::Tuple{String, Bool}, ::Int64, ::Int64)), (6.402e-6, MethodInstance for getindex(::Tuple{Int64, Int64}, ::Int64)), (6.593e-6, MethodInstance for Base.indexed_iterate(::Tuple{Int64, Int64}, ::Int64, ::Int64)), (6.652e-6, MethodInstance for Base.indexed_iterate(::Pair{Symbol, Any}, ::Int64, ::Int64)), (7.083e-6, MethodInstance for Base.indexed_iterate(::Tuple{Any, Int64}, ::Int64, ::Int64)), (8.025e-6, MethodInstance for getindex(::Tuple{Base.OneTo{Int64}}, ::Int64)), (1.0019e-5, MethodInstance for Base.indexed_iterate(::Pair{Symbol, Any}, ::Int64)) … (2.9906e-5, MethodInstance for getproperty(::UnionAll, ::Symbol)), (3.1329e-5, MethodInstance for getproperty(::Vector, ::Symbol)), (3.2872e-5, MethodInstance for getproperty(::BitVector, ::Symbol)), (0.000113894, MethodInstance for LinearIndices(::Vector{Float64})), (0.000261509, MethodInstance for print(::IOContext{Base.PipeEndpoint}, ::Char)), (0.000264282, MethodInstance for haskey(::IOContext{Base.PipeEndpoint}, ::Symbol)), (0.000286777, MethodInstance for print(::IOContext{Base.PipeEndpoint}, ::String)), (0.00034920199999999995, MethodInstance for get(::IOContext{Base.PipeEndpoint}, ::Symbol, ::Type{Any})), (0.00045692500000000006, MethodInstance for get(::IOContext{Base.PipeEndpoint}, ::Symbol, ::Bool)), (0.0013621279999999998, MethodInstance for string(::String, ::Int64, ::String))]) - Main.var"Main".OptimizeMe => (0.024680413999999998, [(0.000109535, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)), (0.024570878999999997, MethodInstance for Main.var"Main".OptimizeMe.main())])

ttot shows the total amount of time spent on type-inference. parcel discovered precompilable MethodInstances for four modules, Core, Base.Multimedia, Base, and OptimizeMe that might benefit from precompile directives. These are listed in increasing order of inference time.

Let's look specifically at OptimizeMeFixed, since that's under our control:

julia> pcmod = pcs[end]Main.var"Main".OptimizeMe => (0.024680413999999998, Tuple{Float64, Core.MethodInstance}[(0.000109535, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)), (0.024570878999999997, MethodInstance for Main.var"Main".OptimizeMe.main())])
julia> tmod, tpcs = pcmod.second;
julia> tmod0.024680413999999998
julia> tpcs2-element Vector{Tuple{Float64, Core.MethodInstance}}: - (0.000109535, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)) - (0.024570878999999997, MethodInstance for Main.var"Main".OptimizeMe.main())

This indicates the amount of time spent specifically on OptimizeMe, plus the list of calls that could be precompiled in that module.

We could look at the other modules (packages) similarly.

SnoopCompile.write

You can generate files that contain ready-to-use precompile directives using SnoopCompile.write:

julia> SnoopCompile.write("/tmp/precompiles_OptimizeMe", pcs)Core: no precompile statements out of 1.914e-6
-Base.Multimedia: no precompile statements out of 7.053e-6
-Base: precompiled 0.0013621279999999998 out of 0.0034157989999999993
-Main.var"Main".OptimizeMe: precompiled 0.024570878999999997 out of 0.024680413999999998

You'll now find a directory /tmp/precompiles_OptimizeMe, and inside you'll find files for modules that could have precompile directives added manually. The contents of the last of these should be recognizable:

function _precompile_()
+Using @snoop_inference to emit manual precompile directives · SnoopCompile

Using @snoop_inference to emit manual precompile directives

In a few cases, it may be inconvenient or impossible to precompile using a workload. Some examples might be:

  • an application that opens graphical windows
  • an application that connects to a database
  • an application that creates, deletes, or rewrites files on disk

In such cases, one alternative is to create a manual list of precompile directives using Julia's precompile(f, argtypes) function.

Warning

Manual precompile directives are much more likely to "go stale" as the package is developed–-precompile does not throw an error if a method for the given argtypes cannot be found. They are also more likely to be dependent on the Julia version, operating system, or CPU architecture. Whenever possible, it's safer to use a workload.

precompile directives have to be emitted by the module that owns the method and/or types. SnoopCompile comes with a tool, parcel, that splits out the "root-most" precompilable MethodInstances into their constituent modules. This will typically correspond to the bottom row of boxes in the flame graph. In cases where you have some that are not naively precompilable, they will include MethodInstances from higher up in the call tree.

Let's use SnoopCompile.parcel on our OptimizeMe demo:

julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();lotsa containers:
julia> ttot, pcs = SnoopCompile.parcel(tinf);
julia> ttot0.064550989
julia> pcs4-element Vector{Pair{Module, Tuple{Float64, Vector{Tuple{Float64, Core.MethodInstance}}}}}: + Core => (1.924e-6, [(1.924e-6, MethodInstance for (NamedTuple{(:sizehint,)})(::Tuple{Int64}))]) + Base.Multimedia => (4.198e-6, [(4.198e-6, MethodInstance for MIME(::String))]) + Base => (0.0029538160000000006, [(1.623e-6, MethodInstance for LinearIndices(::Tuple{Base.OneTo{Int64}})), (1.623e-6, MethodInstance for IOContext(::IOBuffer, ::IOContext{Base.PipeEndpoint})), (3.767e-6, MethodInstance for IOContext(::IOContext{Base.PipeEndpoint}, ::Base.ImmutableDict{Symbol, Any})), (6.042e-6, MethodInstance for Base.indexed_iterate(::Pair{Symbol, Any}, ::Int64, ::Int64)), (6.111e-6, MethodInstance for Base.indexed_iterate(::Tuple{Int64, Int64}, ::Int64, ::Int64)), (6.363e-6, MethodInstance for Base.indexed_iterate(::Tuple{Any, Int64}, ::Int64, ::Int64)), (6.582e-6, MethodInstance for Base.indexed_iterate(::Tuple{String, Bool}, ::Int64, ::Int64)), (6.953e-6, MethodInstance for getindex(::Tuple{Int64, Int64}, ::Int64)), (7.224e-6, MethodInstance for getindex(::Tuple{Base.OneTo{Int64}}, ::Int64)), (8.666e-6, MethodInstance for getproperty(::Module, ::Symbol)) … (2.5759e-5, MethodInstance for getproperty(::UnionAll, ::Symbol)), (2.6229e-5, MethodInstance for getproperty(::DataType, ::Symbol)), (2.9025e-5, MethodInstance for getproperty(::BitVector, ::Symbol)), (3.1599e-5, MethodInstance for getproperty(::Vector, ::Symbol)), (9.8513e-5, MethodInstance for LinearIndices(::Vector{Float64})), (0.00025691099999999997, MethodInstance for haskey(::IOContext{Base.PipeEndpoint}, ::Symbol)), (0.000266438, MethodInstance for print(::IOContext{Base.PipeEndpoint}, ::Char)), (0.000328927, MethodInstance for get(::IOContext{Base.PipeEndpoint}, ::Symbol, ::Type{Any})), (0.00040065500000000003, MethodInstance for get(::IOContext{Base.PipeEndpoint}, ::Symbol, ::Bool)), (0.0013133589999999998, MethodInstance for string(::String, ::Int64, ::String))]) + Main.var"Main".OptimizeMe => (0.023183100999999998, [(7.4248e-5, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)), (0.023108853, MethodInstance for Main.var"Main".OptimizeMe.main())])

ttot shows the total amount of time spent on type-inference. parcel discovered precompilable MethodInstances for four modules, Core, Base.Multimedia, Base, and OptimizeMe that might benefit from precompile directives. These are listed in increasing order of inference time.

Let's look specifically at OptimizeMeFixed, since that's under our control:

julia> pcmod = pcs[end]Main.var"Main".OptimizeMe => (0.023183100999999998, Tuple{Float64, Core.MethodInstance}[(7.4248e-5, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)), (0.023108853, MethodInstance for Main.var"Main".OptimizeMe.main())])
julia> tmod, tpcs = pcmod.second;
julia> tmod0.023183100999999998
julia> tpcs2-element Vector{Tuple{Float64, Core.MethodInstance}}: + (7.4248e-5, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)) + (0.023108853, MethodInstance for Main.var"Main".OptimizeMe.main())

This indicates the amount of time spent specifically on OptimizeMe, plus the list of calls that could be precompiled in that module.

We could look at the other modules (packages) similarly.

SnoopCompile.write

You can generate files that contain ready-to-use precompile directives using SnoopCompile.write:

julia> SnoopCompile.write("/tmp/precompiles_OptimizeMe", pcs)Core: no precompile statements out of 1.924e-6
+Base.Multimedia: no precompile statements out of 4.198e-6
+Base: precompiled 0.0013133589999999998 out of 0.0029538160000000006
+Main.var"Main".OptimizeMe: precompiled 0.023108853 out of 0.023183100999999998

You'll now find a directory /tmp/precompiles_OptimizeMe, and inside you'll find files for modules that could have precompile directives added manually. The contents of the last of these should be recognizable:

function _precompile_()
     ccall(:jl_generating_output, Cint, ()) == 1 || return nothing
     Base.precompile(Tuple{typeof(main)})   # time: 0.4204474
-end

The first ccall line ensures we only pay the cost of running these precompile directives if we're building the package; this is relevant mostly if you're running Julia with --compiled-modules=no, which can be a convenient way to disable precompilation and examine packages in their "native state." (It would also matter if you've set __precompile__(false) at the top of your module, but if so why are you reading this?)

This file is ready to be moved into the OptimizeMe repository and included into your module definition.

You might also consider submitting some of the other files (or their precompile directives) to the packages you depend on.

+end

The first ccall line ensures we only pay the cost of running these precompile directives if we're building the package; this is relevant mostly if you're running Julia with --compiled-modules=no, which can be a convenient way to disable precompilation and examine packages in their "native state." (It would also matter if you've set __precompile__(false) at the top of your module, but if so why are you reading this?)

This file is ready to be moved into the OptimizeMe repository and included into your module definition.

You might also consider submitting some of the other files (or their precompile directives) to the packages you depend on.

diff --git a/dev/tutorials/snoop_llvm/index.html b/dev/tutorials/snoop_llvm/index.html index f6368baa..e500b1fa 100644 --- a/dev/tutorials/snoop_llvm/index.html +++ b/dev/tutorials/snoop_llvm/index.html @@ -8,12 +8,12 @@ The function `iterate` exists, but no method is defined for this combination of argument types. Closest candidates are: - iterate(::Test.GenericString) - @ Test /opt/hostedtoolcache/julia/1.11.1/x64/share/julia/stdlib/v1.11/Test/src/Test.jl:2195 - iterate(::Test.GenericString, ::Integer) - @ Test /opt/hostedtoolcache/julia/1.11.1/x64/share/julia/stdlib/v1.11/Test/src/Test.jl:2195 - iterate(::Base.MethodSpecializations) - @ Base reflection.jl:1299 + iterate(::CompositeException, Any...) + @ Base task.jl:55 + iterate(::LibGit2.GitConfigIter) + @ LibGit2 /opt/hostedtoolcache/julia/1.11.2/x64/share/julia/stdlib/v1.11/LibGit2/src/config.jl:225 + iterate(::LibGit2.GitConfigIter, ::Any) + @ LibGit2 /opt/hostedtoolcache/julia/1.11.2/x64/share/julia/stdlib/v1.11/LibGit2/src/config.jl:225 ...

This will write two files, "func_names.csv" and "llvm_timings.yaml", in your current working directory. Let's look at what was read from these files:

julia> timesERROR: UndefVarError: `times` not defined in `Main.var"Main"`
 Suggestion: check for spelling errors or missing imports.
julia> infoERROR: UndefVarError: `info` not defined in `Main.var"Main"` -Suggestion: check for spelling errors or missing imports.
+Suggestion: check for spelling errors or missing imports.