From b9cfdfb943d6060e264ce34eac17c6451eafd638 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Sun, 24 Nov 2024 16:07:49 +0000 Subject: [PATCH] build based on ae3aad4 --- dev/.documenter-siteinfo.json | 2 +- dev/explanations/basic/index.html | 2 +- dev/explanations/fixing_inference/index.html | 2 +- dev/explanations/gotchas/index.html | 2 +- dev/explanations/tools/index.html | 2 +- dev/index.html | 2 +- dev/reference/index.html | 46 +++++++++---------- dev/tutorials/Blackjack/Project.toml | 2 +- .../BlackjackFacecards/Manifest.toml | 4 +- dev/tutorials/BlackjackFacecards/Project.toml | 4 +- dev/tutorials/invalidations/index.html | 12 ++--- dev/tutorials/jet/index.html | 2 +- dev/tutorials/llvm_timings.yaml | 42 ++++++++--------- dev/tutorials/pgdsgui/index.html | 2 +- dev/tutorials/snoop_inference/index.html | 2 +- .../snoop_inference_analysis/index.html | 8 ++-- .../snoop_inference_parcel/index.html | 22 ++++----- dev/tutorials/snoop_llvm/index.html | 2 +- 18 files changed, 80 insertions(+), 80 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index b35cdef1..49f7038e 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-11-24T11:43:02","documenter_version":"1.8.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-11-24T16:07:42","documenter_version":"1.8.0"}} \ No newline at end of file diff --git a/dev/explanations/basic/index.html b/dev/explanations/basic/index.html index 3e56d31b..fbccb6c0 100644 --- a/dev/explanations/basic/index.html +++ b/dev/explanations/basic/index.html @@ -1,2 +1,2 @@ -Understanding SnoopCompile and Julia's compilation pipeline · SnoopCompile

Understanding SnoopCompile and Julia's compilation pipeline

Julia uses Just-in-time (JIT) compilation to generate the code that runs on your CPU. Broadly speaking, there are two major compilation steps: inference and code generation. Inference is the process of determining the type of each object, which in turn determines which specific methods get called; once type inference is complete, code generation performs optimizations and ultimately generates the assembly language (native code) used on CPUs. Some aspects of this process are documented here.

Using code that has never been compiled requires that it first be JIT-compiled, and this contributes to the latency of using the package. In some circumstances, you can cache (store) the results of compilation to files to reduce the latency when your package is used. These files are the the *.ji and *.so files that live in the compiled directory of your Julia depot, usually located at ~/.julia/compiled. However, if these files become large, loading them can be another source for latency. Julia needs time both to load and validate the cached compiled code. Minimizing the latency of using a package involves focusing on caching the compilation of code that is both commonly used and takes time to compile.

Caching code for later use is called precompilation. Julia has had some forms of precompilation almost since the very first packages. However, it was Julia 1.9 that first supported "complete" precompilation, including the ability to store native code in shared-library cache files.

SnoopCompile is designed to try to allow you to analyze the costs of JIT-compilation, identify key bottlenecks that contribute to latency, and set up precompile directives to see whether it produces measurable benefits.

Package precompilation

When a package is precompiled, here's what happens under the hood:

  • Julia loads all of the package's dependencies (the ones in the [deps] section of the Project.toml file), typically from precompile cache files
  • Julia evaluates the source code (text files) that define the package module(s). Evaluating function foo(args...) ... end creates a new method foo. Note that:
    • the source code might also contain statements that create "data" (e.g., consts). In some cases this can lead to some subtle precompilation "gotchas"
    • the source code might also contain a precompile workload, which forces compilation and tracking of package methods.
  • Julia iterates over the module contents and writes the result to disk. Note that the module contents might include compiled code, and if so it is written along with everything else to the cache file.

When Julia loads your package, it just loads the "snapshot" stored in the cache file: it does not re-evaluate the source-text files that defined your package! It is appropriate to think of the source files of your package as "build scripts" that create your module; once the "build scripts" are executed, it's the module itself that gets cached, and the job of the build scripts is done.

+Understanding SnoopCompile and Julia's compilation pipeline · SnoopCompile

Understanding SnoopCompile and Julia's compilation pipeline

Julia uses Just-in-time (JIT) compilation to generate the code that runs on your CPU. Broadly speaking, there are two major compilation steps: inference and code generation. Inference is the process of determining the type of each object, which in turn determines which specific methods get called; once type inference is complete, code generation performs optimizations and ultimately generates the assembly language (native code) used on CPUs. Some aspects of this process are documented here.

Using code that has never been compiled requires that it first be JIT-compiled, and this contributes to the latency of using the package. In some circumstances, you can cache (store) the results of compilation to files to reduce the latency when your package is used. These files are the the *.ji and *.so files that live in the compiled directory of your Julia depot, usually located at ~/.julia/compiled. However, if these files become large, loading them can be another source for latency. Julia needs time both to load and validate the cached compiled code. Minimizing the latency of using a package involves focusing on caching the compilation of code that is both commonly used and takes time to compile.

Caching code for later use is called precompilation. Julia has had some forms of precompilation almost since the very first packages. However, it was Julia 1.9 that first supported "complete" precompilation, including the ability to store native code in shared-library cache files.

SnoopCompile is designed to try to allow you to analyze the costs of JIT-compilation, identify key bottlenecks that contribute to latency, and set up precompile directives to see whether it produces measurable benefits.

Package precompilation

When a package is precompiled, here's what happens under the hood:

  • Julia loads all of the package's dependencies (the ones in the [deps] section of the Project.toml file), typically from precompile cache files
  • Julia evaluates the source code (text files) that define the package module(s). Evaluating function foo(args...) ... end creates a new method foo. Note that:
    • the source code might also contain statements that create "data" (e.g., consts). In some cases this can lead to some subtle precompilation "gotchas"
    • the source code might also contain a precompile workload, which forces compilation and tracking of package methods.
  • Julia iterates over the module contents and writes the result to disk. Note that the module contents might include compiled code, and if so it is written along with everything else to the cache file.

When Julia loads your package, it just loads the "snapshot" stored in the cache file: it does not re-evaluate the source-text files that defined your package! It is appropriate to think of the source files of your package as "build scripts" that create your module; once the "build scripts" are executed, it's the module itself that gets cached, and the job of the build scripts is done.

diff --git a/dev/explanations/fixing_inference/index.html b/dev/explanations/fixing_inference/index.html index aede1b96..51189b3a 100644 --- a/dev/explanations/fixing_inference/index.html +++ b/dev/explanations/fixing_inference/index.html @@ -50,4 +50,4 @@ return getfield(d, :maker)::Union{String,Symbol} end return getfield(d, name) -end

Julia's constant propagation will ensure that most accesses of those fields will be determined at compile-time, so this simple change robustly fixes many inference problems.

Fixing Core.Box

Julia issue 15276 is one of the more surprising forms of inference failure; it is the most common cause of a Core.Box annotation. If other variables depend on the Boxed variable, then a single Core.Box can lead to widespread inference problems. For this reason, these are also among the first inference problems you should tackle.

Read this explanation of why this happens and what you can do to fix it. If you are directed to find Core.Box inference triggers via suggest, you may need to explore around the call site a bit– the inference trigger may be in the closure itself, but the fix needs to go in the method that creates the closure.

Use of ascend is highly recommended for fixing Core.Box inference failures.

Handling edge cases

You can sometimes get invalidations from failing to handle "formal" possibilities. For example, operations with regular expressions might return a Union{Nothing, RegexMatch}. You can sometimes get poor type inference by writing code that fails to take account of the possibility that nothing might be returned. For example, a comprehension

ms = [m.match for m in match.((rex,), my_strings)]

might be replaced with

ms = [m.match for m in match.((rex,), my_strings) if m !== nothing]

and return a better-typed result.

+end

Julia's constant propagation will ensure that most accesses of those fields will be determined at compile-time, so this simple change robustly fixes many inference problems.

Fixing Core.Box

Julia issue 15276 is one of the more surprising forms of inference failure; it is the most common cause of a Core.Box annotation. If other variables depend on the Boxed variable, then a single Core.Box can lead to widespread inference problems. For this reason, these are also among the first inference problems you should tackle.

Read this explanation of why this happens and what you can do to fix it. If you are directed to find Core.Box inference triggers via suggest, you may need to explore around the call site a bit– the inference trigger may be in the closure itself, but the fix needs to go in the method that creates the closure.

Use of ascend is highly recommended for fixing Core.Box inference failures.

Handling edge cases

You can sometimes get invalidations from failing to handle "formal" possibilities. For example, operations with regular expressions might return a Union{Nothing, RegexMatch}. You can sometimes get poor type inference by writing code that fails to take account of the possibility that nothing might be returned. For example, a comprehension

ms = [m.match for m in match.((rex,), my_strings)]

might be replaced with

ms = [m.match for m in match.((rex,), my_strings) if m !== nothing]

and return a better-typed result.

diff --git a/dev/explanations/gotchas/index.html b/dev/explanations/gotchas/index.html index f111b208..391b28ef 100644 --- a/dev/explanations/gotchas/index.html +++ b/dev/explanations/gotchas/index.html @@ -2,4 +2,4 @@ Precompilation "gotcha"s · SnoopCompile

Precompilation "gotcha"s

Running code during module definition

Suppose you're working on an astronomy package and your source code has a line

const planets = map(makeplanet, ["Mercury", ...])

Julia will dutifully create planets and store it in the package's precompile cache file. This also runs makeplanet, and if this is the first time it gets run, it will compile makeplanet. Assuming that makeplanet is a method defined in the package, the compiled code for makeplanet will be stored in the cache file.

However, two circumstances can lead to puzzling omissions from the cache files:

  • if makeplanet is a method defined in a dependency of your package, it will not be cached in your package. You'd want to add precompilation of makeplanet to the package that creates that method.
  • if makeplanet is poorly-infered and uses runtime dispatch, any such callees that are not owned by your package will not be cached. For example, suppose makeplanet ends up calling methods in Base Julia or its standard libraries that are not precompiled into Julia itself: the compiled code for those methods will not be added to the cache file.

One option to ensure this dependent code gets cached is to create planets inside PrecompileTools.@compile_workload:

@compile_workload begin
     global planets
     const planet = map(makeplanet, ["Mercury", ...])
-end

Note that your package definition can have multiple @compile_workload blocks.

+end

Note that your package definition can have multiple @compile_workload blocks.

diff --git a/dev/explanations/tools/index.html b/dev/explanations/tools/index.html index e7911730..94486d9d 100644 --- a/dev/explanations/tools/index.html +++ b/dev/explanations/tools/index.html @@ -1,2 +1,2 @@ -Package roles and alternatives · SnoopCompile

Package roles and alternatives

SnoopCompileCore

SnoopCompileCore is a tiny package with no dependencies; it's used for collecting data, and it has been designed in such a way that it cannot cause any invalidations of its own. Collecting data on invalidations and inference with SnoopCompileCore is the only way you can be sure you are observing the "native state" of your code.

SnoopCompile

SnoopCompile is a much larger package that performs analysis on the data collected by SnoopCompileCore; loading SnoopCompile can (and does) trigger invalidations. Consequently, you're urged to always collect data with just SnoopCompileCore loaded, and wait to load SnoopCompile until after you've finished collecting the data.

Cthulhu

Cthulhu is a companion package that gives deep insights into the origin of invalidations or inference failures.

AbstractTrees

AbstractTrees is the one package in this list that can be both a "workhorse" and a developer tool. SnoopCompile uses it mostly for pretty-printing.

JET

JET is a powerful developer tool that in some ways is an alternative to SnoopCompile. While the two have different goals, the packages have some overlap in what they can tell you about your code. However, their mechanisms of action are fundamentally different:

  • JET is a "static analyzer," which means that it analyzes the code itself. JET can tell you about inference failures (runtime dispatch) much like SnoopCompile, with a major advantage: SnoopCompileCore omits information about any callees that are already compiled, but JET's @report_opt provides exhaustive information about the entire inferable callgraph (i.e., the part of the callgraph that inference can predict from the initial call) regardless of whether it has been previously compiled. With JET, you don't have to remember to run each analysis in a fresh session.

  • SnoopCompileCore collects data by watching normal inference at work. On code that hasn't been compiled previously, this can yield results similar to JET's, with a different major advantage: JET can't "see through" runtime dispatch, but SnoopCompileCore can. With SnoopCompile, you can immediately get a wholistic view of your entire callgraph.

Combining JET and SnoopCompile can provide insights that are difficult to obtain with either package in isolation. See the Tutorial on JET integration.

+Package roles and alternatives · SnoopCompile

Package roles and alternatives

SnoopCompileCore

SnoopCompileCore is a tiny package with no dependencies; it's used for collecting data, and it has been designed in such a way that it cannot cause any invalidations of its own. Collecting data on invalidations and inference with SnoopCompileCore is the only way you can be sure you are observing the "native state" of your code.

SnoopCompile

SnoopCompile is a much larger package that performs analysis on the data collected by SnoopCompileCore; loading SnoopCompile can (and does) trigger invalidations. Consequently, you're urged to always collect data with just SnoopCompileCore loaded, and wait to load SnoopCompile until after you've finished collecting the data.

Cthulhu

Cthulhu is a companion package that gives deep insights into the origin of invalidations or inference failures.

AbstractTrees

AbstractTrees is the one package in this list that can be both a "workhorse" and a developer tool. SnoopCompile uses it mostly for pretty-printing.

JET

JET is a powerful developer tool that in some ways is an alternative to SnoopCompile. While the two have different goals, the packages have some overlap in what they can tell you about your code. However, their mechanisms of action are fundamentally different:

  • JET is a "static analyzer," which means that it analyzes the code itself. JET can tell you about inference failures (runtime dispatch) much like SnoopCompile, with a major advantage: SnoopCompileCore omits information about any callees that are already compiled, but JET's @report_opt provides exhaustive information about the entire inferable callgraph (i.e., the part of the callgraph that inference can predict from the initial call) regardless of whether it has been previously compiled. With JET, you don't have to remember to run each analysis in a fresh session.

  • SnoopCompileCore collects data by watching normal inference at work. On code that hasn't been compiled previously, this can yield results similar to JET's, with a different major advantage: JET can't "see through" runtime dispatch, but SnoopCompileCore can. With SnoopCompile, you can immediately get a wholistic view of your entire callgraph.

Combining JET and SnoopCompile can provide insights that are difficult to obtain with either package in isolation. See the Tutorial on JET integration.

diff --git a/dev/index.html b/dev/index.html index b33d2c59..05d97f80 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -SnoopCompile.jl · SnoopCompile

SnoopCompile.jl

Julia is fast, but its execution speed depends on optimizing code through compilation. Code must be compiled before you can use it, and unfortunately compilation is slow. This can cause latency the first time you use code: this latency is often called time-to-first-plot (TTFP) or more generally time-to-first-execution (TTFX). If something feels slow the first time you use it, and fast thereafter, you're probably experiencing the latency of compilation. Note that TTFX is distinct from time-to-load (TTL, which refers to the time you spend waiting for using MyPkg to finish), even though both contribute to latency.

Modern versions of Julia can store compiled code to disk (precompilation) to reduce or eliminate latency. Users and developers who are interested in reducing TTFX should first head to PrecompileTools, read its documentation thoroughly, and try using it to solve latency problems.

This package, SnoopCompile, should be considered when:

  • precompilation doesn't reduce TTFX as much as you wish
  • precompilation "works," but only in isolation: as soon as you load (certain) additional packages, TTFX is bad again
  • you're wondering if you can reduce the amount of time needed to precompile your package and/or the size of the precompilation cache files

In other words, SnoopCompile is a diagonostic package that helps reveal the causes of latency. Historically, it proceeded PrecompileTools, and indeed PrecompileTools was split out from SnoopCompile. Today, SnoopCompile is generally needed only when PrecompileTools fails to deliver the desired benefits.

SnoopCompile analysis modes

SnoopCompile "snoops" on the Julia compiler, collecting information that may be useful to developers. Here are some of the things you can do with SnoopCompile:

  • diagnose invalidations, cases where Julia must throw away previously-compiled code (see Tutorial on @snoop_invalidations)
  • trace inference, to learn what code is being newly (or freshly) analyzed in an early stage of the compilation pipeline (Tutorial on @snoop_inference)
  • trace code generation by LLVM, a late stage in the compilation pipeline (Tutorial on @snoop_llvm)
  • reveal methods with excessive numbers of compiler-generated specializations, a.k.a.profile-guided despecialization (Tutorial on PGDS)
  • integrate with tools like JET to further reduce the risk that your lovingly-precompiled code will be invalidated by loading other packages (Tutorial on JET integration)

Background information

If nothing else, you should know this:

  • invalidations occur when you load code (e.g., using MyPkg) or otherwise define new methods
  • inference and other stages of compilation occur the first time you run code for a particular combination of input types

The individual tutorials briefly explain core concepts. More detail can be found in Understanding SnoopCompile and Julia's compilation pipeline.

Who should use this package

SnoopCompile is intended primarily for package developers who want to improve the experience for their users. It is also recommended for users who are willing to "dig deep" and understand why packages they depend on have high latency. Your experience with latency may be personal, as it can depend on the specific combination of packages you load. If latency troubles you, don't make the assumption that it must be unfixable: you might be the first person affected by that specific cause of latency.

+SnoopCompile.jl · SnoopCompile

SnoopCompile.jl

Julia is fast, but its execution speed depends on optimizing code through compilation. Code must be compiled before you can use it, and unfortunately compilation is slow. This can cause latency the first time you use code: this latency is often called time-to-first-plot (TTFP) or more generally time-to-first-execution (TTFX). If something feels slow the first time you use it, and fast thereafter, you're probably experiencing the latency of compilation. Note that TTFX is distinct from time-to-load (TTL, which refers to the time you spend waiting for using MyPkg to finish), even though both contribute to latency.

Modern versions of Julia can store compiled code to disk (precompilation) to reduce or eliminate latency. Users and developers who are interested in reducing TTFX should first head to PrecompileTools, read its documentation thoroughly, and try using it to solve latency problems.

This package, SnoopCompile, should be considered when:

  • precompilation doesn't reduce TTFX as much as you wish
  • precompilation "works," but only in isolation: as soon as you load (certain) additional packages, TTFX is bad again
  • you're wondering if you can reduce the amount of time needed to precompile your package and/or the size of the precompilation cache files

In other words, SnoopCompile is a diagonostic package that helps reveal the causes of latency. Historically, it proceeded PrecompileTools, and indeed PrecompileTools was split out from SnoopCompile. Today, SnoopCompile is generally needed only when PrecompileTools fails to deliver the desired benefits.

SnoopCompile analysis modes

SnoopCompile "snoops" on the Julia compiler, collecting information that may be useful to developers. Here are some of the things you can do with SnoopCompile:

  • diagnose invalidations, cases where Julia must throw away previously-compiled code (see Tutorial on @snoop_invalidations)
  • trace inference, to learn what code is being newly (or freshly) analyzed in an early stage of the compilation pipeline (Tutorial on @snoop_inference)
  • trace code generation by LLVM, a late stage in the compilation pipeline (Tutorial on @snoop_llvm)
  • reveal methods with excessive numbers of compiler-generated specializations, a.k.a.profile-guided despecialization (Tutorial on PGDS)
  • integrate with tools like JET to further reduce the risk that your lovingly-precompiled code will be invalidated by loading other packages (Tutorial on JET integration)

Background information

If nothing else, you should know this:

  • invalidations occur when you load code (e.g., using MyPkg) or otherwise define new methods
  • inference and other stages of compilation occur the first time you run code for a particular combination of input types

The individual tutorials briefly explain core concepts. More detail can be found in Understanding SnoopCompile and Julia's compilation pipeline.

Who should use this package

SnoopCompile is intended primarily for package developers who want to improve the experience for their users. It is also recommended for users who are willing to "dig deep" and understand why packages they depend on have high latency. Your experience with latency may be personal, as it can depend on the specific combination of packages you load. If latency troubles you, don't make the assumption that it must be unfixable: you might be the first person affected by that specific cause of latency.

diff --git a/dev/reference/index.html b/dev/reference/index.html index d7cd8753..b9e5cf9f 100644 --- a/dev/reference/index.html +++ b/dev/reference/index.html @@ -1,14 +1,14 @@ -Reference · SnoopCompile

Reference

Data collection

SnoopCompileCore.@snoop_invalidationsMacro
invs = @snoop_invalidations expr

Capture method cache invalidations triggered by evaluating expr. invs is a sequence of invalidated Core.MethodInstances together with "explanations," consisting of integers (encoding depth) and strings (documenting the source of an invalidation).

Unless you are working at a low level, you essentially always want to pass invs directly to SnoopCompile.invalidation_trees.

Extended help

invs is in a format where the "reason" comes after the items. Method deletion results in the sequence

[zero or more (mi, "invalidate_mt_cache") pairs..., zero or more (depth1 tree, loctag) pairs..., method, loctag] with loctag = "jl_method_table_disable"

where mi means a MethodInstance. depth1 means a sequence starting at depth=1.

Method insertion results in the sequence

[zero or more (depth0 tree, sig) pairs..., same info as with delete_method except loctag = "jl_method_table_insert"]

The authoritative reference is Julia's own src/gf.c file.

source
SnoopCompileCore.@snoop_inferenceMacro
tinf = @snoop_inference commands;

Produce a profile of julia's type inference, recording the amount of time spent inferring every MethodInstance processed while executing commands. Each fresh entrance to type inference (whether executed directly in commands or because a call was made by runtime-dispatch) also collects a backtrace so the caller can be identified.

tinf is a tree, each node containing data on a particular inference "frame" (the method, argument-type specializations, parameters, and even any constant-propagated values). Each reports the exclusive/inclusive times, where the exclusive time corresponds to the time spent inferring this frame in and of itself, whereas the inclusive time includes the time needed to infer all the callees of this frame.

The top-level node in this profile tree is ROOT. Uniquely, its exclusive time corresponds to the time spent not in julia's type inference (codegen, llvm_opt, runtime, etc).

Working with tinf effectively requires loading SnoopCompile.

Warning

Note the semicolon ; at the end of the @snoop_inference macro call. Because SnoopCompileCore is not permitted to invalidate any code, it cannot define the Base.show methods that pretty-print tinf. Defer inspection of tinf until SnoopCompile has been loaded.

Example

julia> tinf = @snoop_inference begin
+Reference · SnoopCompile

Reference

Data collection

SnoopCompileCore.@snoop_invalidationsMacro
invs = @snoop_invalidations expr

Capture method cache invalidations triggered by evaluating expr. invs is a sequence of invalidated Core.MethodInstances together with "explanations," consisting of integers (encoding depth) and strings (documenting the source of an invalidation).

Unless you are working at a low level, you essentially always want to pass invs directly to SnoopCompile.invalidation_trees.

Extended help

invs is in a format where the "reason" comes after the items. Method deletion results in the sequence

[zero or more (mi, "invalidate_mt_cache") pairs..., zero or more (depth1 tree, loctag) pairs..., method, loctag] with loctag = "jl_method_table_disable"

where mi means a MethodInstance. depth1 means a sequence starting at depth=1.

Method insertion results in the sequence

[zero or more (depth0 tree, sig) pairs..., same info as with delete_method except loctag = "jl_method_table_insert"]

The authoritative reference is Julia's own src/gf.c file.

source
SnoopCompileCore.@snoop_inferenceMacro
tinf = @snoop_inference commands;

Produce a profile of julia's type inference, recording the amount of time spent inferring every MethodInstance processed while executing commands. Each fresh entrance to type inference (whether executed directly in commands or because a call was made by runtime-dispatch) also collects a backtrace so the caller can be identified.

tinf is a tree, each node containing data on a particular inference "frame" (the method, argument-type specializations, parameters, and even any constant-propagated values). Each reports the exclusive/inclusive times, where the exclusive time corresponds to the time spent inferring this frame in and of itself, whereas the inclusive time includes the time needed to infer all the callees of this frame.

The top-level node in this profile tree is ROOT. Uniquely, its exclusive time corresponds to the time spent not in julia's type inference (codegen, llvm_opt, runtime, etc).

Working with tinf effectively requires loading SnoopCompile.

Warning

Note the semicolon ; at the end of the @snoop_inference macro call. Because SnoopCompileCore is not permitted to invalidate any code, it cannot define the Base.show methods that pretty-print tinf. Defer inspection of tinf until SnoopCompile has been loaded.

Example

julia> tinf = @snoop_inference begin
            sort(rand(100))  # Evaluate some code and profile julia's type inference
-       end;
source
SnoopCompileCore.@snoop_llvmMacro
@snoop_llvm "func_names.csv" "llvm_timings.yaml" begin
     # Commands to execute, in a new process
-end

causes the julia compiler to log timing information for LLVM optimization during the provided commands to the files "funcnames.csv" and "llvmtimings.yaml". These files can be used for the input to SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml").

The logs contain the amount of time spent optimizing each "llvm module", and information about each module, where a module is a collection of functions being optimized together.

source

GUIs

SnoopCompile.flamegraphFunction
flamegraph(tinf::InferenceTimingNode; tmin=0.0, excluded_modules=Set([Main]), mode=nothing)

Convert the call tree of inference timings returned from @snoop_inference into a FlameGraph. Returns a FlameGraphs.FlameGraph structure that represents the timing trace recorded for type inference.

Frames that take less than tmin seconds of inclusive time will not be included in the resultant FlameGraph (meaning total time including it and all of its children). This can be helpful if you have a very big profile, to save on processing time.

Non-precompilable frames are marked in reddish colors. excluded_modules can be used to mark methods defined in modules to which you cannot or do not wish to add precompiles.

mode controls how frames are named in tools like ProfileView. nothing uses the default of just the qualified function name, whereas supplying mode=Dict(method => count) counting the number of specializations of each method will cause the number of specializations to be included in the frame name.

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
+end

causes the julia compiler to log timing information for LLVM optimization during the provided commands to the files "funcnames.csv" and "llvmtimings.yaml". These files can be used for the input to SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml").

The logs contain the amount of time spent optimizing each "llvm module", and information about each module, where a module is a collection of functions being optimized together.

source

GUIs

SnoopCompile.flamegraphFunction
flamegraph(tinf::InferenceTimingNode; tmin=0.0, excluded_modules=Set([Main]), mode=nothing)

Convert the call tree of inference timings returned from @snoop_inference into a FlameGraph. Returns a FlameGraphs.FlameGraph structure that represents the timing trace recorded for type inference.

Frames that take less than tmin seconds of inclusive time will not be included in the resultant FlameGraph (meaning total time including it and all of its children). This can be helpful if you have a very big profile, to save on processing time.

Non-precompilable frames are marked in reddish colors. excluded_modules can be used to mark methods defined in modules to which you cannot or do not wish to add precompiles.

mode controls how frames are named in tools like ProfileView. nothing uses the default of just the qualified function name, whereas supplying mode=Dict(method => count) counting the number of specializations of each method will cause the number of specializations to be included in the frame name.

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
 InferenceTimingNode: 0.002148974/0.002767166 on Core.Compiler.Timings.ROOT() with 1 direct children
 
 julia> fg = flamegraph(tinf)
-Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:75, 0x00, 0:3334431))
julia> ProfileView.view(fg);  # Display the FlameGraph in a package that supports it

You should be able to reconcile the resulting flamegraph to print_tree(tinf) (see flatten).

The empty horizontal periods in the flamegraph correspond to times when something other than inference is running. The total width of the flamegraph is set from the ROOT node.

source
SnoopCompile.pgdsguiFunction
methodref, ax = pgdsgui(tinf::InferenceTimingNode; consts::Bool=true, by=inclusive)
-methodref     = pgdsgui(ax, tinf::InferenceTimingNode; kwargs...)

Create a scatter plot comparing: - (vertical axis) the inference time for all instances of each Method, as captured by tinf; - (horizontal axis) the run time cost, as estimated by capturing a @profile before calling this function.

Each dot corresponds to a single method. The face color encodes the number of times that method was inferred, and the edge color corresponds to the fraction of the runtime spent on runtime dispatch (black is 0%, bright red is 100%). Clicking on a dot prints the method (or location, if inlined) to the REPL, and sets methodref[] to that method.

ax is the pyplot axis of the scatterplot.

Compat

pgdsgui depends on PyPlot via Julia extensions. You must load both SnoopCompile and PyPlot for this function to be defined.

source

Analysis of invalidations

SnoopCompile.uinvalidatedFunction
umis = uinvalidated(invlist)

Return the unique invalidated MethodInstances. invlist is obtained from SnoopCompileCore.@snoop_invalidations. This is similar to filtering for MethodInstances in invlist, except that it discards any tagged "invalidate_mt_cache". These can typically be ignored because they are nearly inconsequential: they do not invalidate any compiled code, they only transiently affect an optimization of runtime dispatch.

source
SnoopCompile.invalidation_treesFunction
trees = invalidation_trees(list)

Parse list, as captured by SnoopCompileCore.@snoop_invalidations, into a set of invalidation trees, where parents nodes were called by their children.

Example

julia> f(x::Int)  = 1
+Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:75, 0x00, 0:3334431))
julia> ProfileView.view(fg);  # Display the FlameGraph in a package that supports it

You should be able to reconcile the resulting flamegraph to print_tree(tinf) (see flatten).

The empty horizontal periods in the flamegraph correspond to times when something other than inference is running. The total width of the flamegraph is set from the ROOT node.

source
SnoopCompile.pgdsguiFunction
methodref, ax = pgdsgui(tinf::InferenceTimingNode; consts::Bool=true, by=inclusive)
+methodref     = pgdsgui(ax, tinf::InferenceTimingNode; kwargs...)

Create a scatter plot comparing: - (vertical axis) the inference time for all instances of each Method, as captured by tinf; - (horizontal axis) the run time cost, as estimated by capturing a @profile before calling this function.

Each dot corresponds to a single method. The face color encodes the number of times that method was inferred, and the edge color corresponds to the fraction of the runtime spent on runtime dispatch (black is 0%, bright red is 100%). Clicking on a dot prints the method (or location, if inlined) to the REPL, and sets methodref[] to that method.

ax is the pyplot axis of the scatterplot.

Compat

pgdsgui depends on PyPlot via Julia extensions. You must load both SnoopCompile and PyPlot for this function to be defined.

source

Analysis of invalidations

SnoopCompile.uinvalidatedFunction
umis = uinvalidated(invlist)

Return the unique invalidated MethodInstances. invlist is obtained from SnoopCompileCore.@snoop_invalidations. This is similar to filtering for MethodInstances in invlist, except that it discards any tagged "invalidate_mt_cache". These can typically be ignored because they are nearly inconsequential: they do not invalidate any compiled code, they only transiently affect an optimization of runtime dispatch.

source
SnoopCompile.invalidation_treesFunction
trees = invalidation_trees(list)

Parse list, as captured by SnoopCompileCore.@snoop_invalidations, into a set of invalidation trees, where parents nodes were called by their children.

Example

julia> f(x::Int)  = 1
 f (generic function with 1 method)
 
 julia> f(x::Bool) = 2
@@ -30,14 +30,14 @@
 julia> trees = invalidation_trees(@snoop_invalidations f(::AbstractFloat) = 3)
 1-element Array{SnoopCompile.MethodInvalidations,1}:
  inserting f(::AbstractFloat) in Main at REPL[36]:1 invalidated:
-   mt_backedges: 1: signature Tuple{typeof(f),Any} triggered MethodInstance for applyf(::Array{Any,1}) (1 children) more specific

See the documentation for further details.

source
SnoopCompile.precompile_blockersFunction
staletrees = precompile_blockers(invalidations, tinf::InferenceTimingNode)

Select just those invalidations that contribute to "stale nodes" in tinf, and link them together. This can allow one to identify specific blockers of precompilation for particular MethodInstances.

Example

using SnoopCompileCore
+   mt_backedges: 1: signature Tuple{typeof(f),Any} triggered MethodInstance for applyf(::Array{Any,1}) (1 children) more specific

See the documentation for further details.

source
SnoopCompile.precompile_blockersFunction
staletrees = precompile_blockers(invalidations, tinf::InferenceTimingNode)

Select just those invalidations that contribute to "stale nodes" in tinf, and link them together. This can allow one to identify specific blockers of precompilation for particular MethodInstances.

Example

using SnoopCompileCore
 invalidations = @snoop_invalidations using PkgA, PkgB;
 using SnoopCompile
 trees = invalidation_trees(invalidations)
 tinf = @snoop_inference begin
     some_workload()
 end
-staletrees = precompile_blockers(trees, tinf)

In many cases, this reduces the number of invalidations that require analysis by one or more orders of magnitude.

Info

precompile_blockers is experimental and has not yet been thoroughly vetted by real-world use. Users are encouraged to try it and report any "misses" or unnecessary "hits."

source
SnoopCompile.filtermodFunction
modtrigs = filtermod(mod::Module, mtrigs::AbstractVector{MethodTriggers})

Select just the method-based triggers arising from a particular module.

source
thinned = filtermod(module, trees::AbstractVector{MethodInvalidations}; recursive=false)

Select just the cases of invalidating a method defined in module.

If recursive is false, only the roots of trees are examined (i.e., the proximal source of the invalidation must be in module). If recursive is true, then thinned contains all routes to a method in module.

source
SnoopCompile.findcallerFunction
methinvs = findcaller(method::Method, trees)

Find a path through trees that reaches method. Returns a single MethodInvalidations object.

Examples

Suppose you know that loading package SomePkg triggers invalidation of f(data). You can find the specific source of invalidation as follows:

f(data)                             # run once to force compilation
+staletrees = precompile_blockers(trees, tinf)

In many cases, this reduces the number of invalidations that require analysis by one or more orders of magnitude.

Info

precompile_blockers is experimental and has not yet been thoroughly vetted by real-world use. Users are encouraged to try it and report any "misses" or unnecessary "hits."

source
SnoopCompile.filtermodFunction
modtrigs = filtermod(mod::Module, mtrigs::AbstractVector{MethodTriggers})

Select just the method-based triggers arising from a particular module.

source
thinned = filtermod(module, trees::AbstractVector{MethodInvalidations}; recursive=false)

Select just the cases of invalidating a method defined in module.

If recursive is false, only the roots of trees are examined (i.e., the proximal source of the invalidation must be in module). If recursive is true, then thinned contains all routes to a method in module.

source
SnoopCompile.findcallerFunction
methinvs = findcaller(method::Method, trees)

Find a path through trees that reaches method. Returns a single MethodInvalidations object.

Examples

Suppose you know that loading package SomePkg triggers invalidation of f(data). You can find the specific source of invalidation as follows:

f(data)                             # run once to force compilation
 m = @which f(data)
 using SnoopCompile
 trees = invalidation_trees(@snoop_invalidations using SomePkg)
@@ -54,7 +54,7 @@
 
 julia> findcaller(m, trees)
 inserting ==(x, y::SomeType) in SomeOtherPkg at /path/to/code:100 invalidated:
-   backedges: 1: superseding ==(x, y) in Base at operators.jl:83 with MethodInstance for ==(::Symbol, ::Any) (16 children) more specific
source
SnoopCompile.report_invalidationsFunction
report_invalidations(
+   backedges: 1: superseding ==(x, y) in Base at operators.jl:83 with MethodInstance for ==(::Symbol, ::Any) (16 children) more specific
source
SnoopCompile.report_invalidationsFunction
report_invalidations(
     io::IO = stdout;
     invalidations,
     n_rows::Int = 10,
@@ -68,7 +68,7 @@
 
 using SnoopCompile
 using PrettyTables # to load report_invalidations
-report_invalidations(;invalidations)

Using report_invalidations requires that you first load the PrettyTables.jl package.

source

Analysis of @snoop_inference

SnoopCompile.flattenFunction
flatten(tinf; tmin = 0.0, sortby=exclusive)

Flatten the execution graph of InferenceTimingNodes returned from @snoop_inference into a Vector of InferenceTiming frames, each encoding the time needed for inference of a single MethodInstance. By default, results are sorted by exclusive time (the time for inferring the MethodInstance itself, not including any inference of its callees); other options are sortedby=inclusive which includes the time needed for the callees, or nothing to obtain them in the order they were inferred (depth-first order).

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
+report_invalidations(;invalidations)

Using report_invalidations requires that you first load the PrettyTables.jl package.

source

Analysis of @snoop_inference

SnoopCompile.flattenFunction
flatten(tinf; tmin = 0.0, sortby=exclusive)

Flatten the execution graph of InferenceTimingNodes returned from @snoop_inference into a Vector of InferenceTiming frames, each encoding the time needed for inference of a single MethodInstance. By default, results are sorted by exclusive time (the time for inferring the MethodInstance itself, not including any inference of its callees); other options are sortedby=inclusive which includes the time needed for the callees, or nothing to obtain them in the order they were inferred (depth-first order).

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
 InferenceTimingNode: 0.002148974/0.002767166 on Core.Compiler.Timings.ROOT() with 1 direct children
 
 julia> using AbstractTrees; print_tree(tinf)
@@ -102,7 +102,7 @@
  InferenceTiming: 0.000136496/0.000136496 on SnoopCompile.FlattenDemo.domath(::Int64)
  InferenceTiming: 9.43e-5/0.00035551200000000005 on SnoopCompile.FlattenDemo.dostuff(::SnoopCompile.FlattenDemo.MyType{Int64})
  InferenceTiming: 0.000150891/0.0006117210000000001 on SnoopCompile.FlattenDemo.packintype(::Int64)
- InferenceTiming: 0.002423543/0.0030352639999999998 on Core.Compiler.Timings.ROOT()

As you can see, sortby affects not just the order but also the selection of frames; with exclusive times, dostuff did not on its own rise above threshold, but it does when using inclusive times.

See also: accumulate_by_source.

source
SnoopCompile.accumulate_by_sourceFunction
accumulate_by_source(flattened; tmin = 0.0, by=exclusive)

Add the inference timings for all MethodInstances of a single Method together. flattened is the output of flatten. Returns a list of (t, method) tuples.

When the accumulated time for a Method is large, but each instance is small, it indicates that it is being inferred for many specializations (which might include specializations with different constants).

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
+ InferenceTiming: 0.002423543/0.0030352639999999998 on Core.Compiler.Timings.ROOT()

As you can see, sortby affects not just the order but also the selection of frames; with exclusive times, dostuff did not on its own rise above threshold, but it does when using inclusive times.

See also: accumulate_by_source.

source
SnoopCompile.accumulate_by_sourceFunction
accumulate_by_source(flattened; tmin = 0.0, by=exclusive)

Add the inference timings for all MethodInstances of a single Method together. flattened is the output of flatten. Returns a list of (t, method) tuples.

When the accumulated time for a Method is large, but each instance is small, it indicates that it is being inferred for many specializations (which might include specializations with different constants).

Example

We'll use SnoopCompile.flatten_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.flatten_demo()
 InferenceTimingNode: 0.004978/0.005447 on Core.Compiler.Timings.ROOT() with 1 direct children
 
 julia> accumulate_by_source(flatten(tinf))
@@ -113,15 +113,15 @@
  (8.9997e-5, (var"#ctor-self#"::Type{SnoopCompile.FlattenDemo.MyType{T}} where T)(x) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:35)
  (9.2256e-5, domath(x) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:41)
  (0.000117514, packintype(x) @ SnoopCompile.FlattenDemo ~/.julia/dev/SnoopCompile/src/inference_demos.jl:37)
- (0.004977755, ROOT() @ Core.Compiler.Timings compiler/typeinfer.jl:79)

Compared to the output from flatten, the two inferences passes on getproperty have been consolidated into a single aggregate call.

source
mtrigs = accumulate_by_source(Method, itrigs::AbstractVector{InferenceTrigger})

Consolidate inference triggers via their caller method. mtrigs is a vector of Method=>list pairs, where list is a list of InferenceTriggers.

source
loctrigs = accumulate_by_source(itrigs::AbstractVector{InferenceTrigger})

Aggregate inference triggers by location (function, file, and line number) of the caller.

Example

We collect data using the SnoopCompile.itrigs_demo:

julia> itrigs = inference_triggers(SnoopCompile.itrigs_demo())
+ (0.004977755, ROOT() @ Core.Compiler.Timings compiler/typeinfer.jl:79)

Compared to the output from flatten, the two inferences passes on getproperty have been consolidated into a single aggregate call.

source
mtrigs = accumulate_by_source(Method, itrigs::AbstractVector{InferenceTrigger})

Consolidate inference triggers via their caller method. mtrigs is a vector of Method=>list pairs, where list is a list of InferenceTriggers.

source
loctrigs = accumulate_by_source(itrigs::AbstractVector{InferenceTrigger})

Aggregate inference triggers by location (function, file, and line number) of the caller.

Example

We collect data using the SnoopCompile.itrigs_demo:

julia> itrigs = inference_triggers(SnoopCompile.itrigs_demo())
 2-element Vector{InferenceTrigger}:
  Inference triggered to call MethodInstance for double(::UInt8) from calldouble1 (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:762) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:763)
  Inference triggered to call MethodInstance for double(::Float64) from calldouble1 (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:762) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:763)
 
 julia> accumulate_by_source(itrigs)
 1-element Vector{SnoopCompile.LocationTriggers}:
-    calldouble1 at /pathto/SnoopCompile/src/parcel_snoop_inference.jl:762 (2 callees from 1 callers)
source
SnoopCompile.collect_forFunction
list = collect_for(m::Method, tinf::InferenceTimingNode)
-list = collect_for(m::MethodInstance, tinf::InferenceTimingNode)

Collect all InferenceTimingNodes (descendants of tinf) that match m.

source
SnoopCompile.staleinstancesFunction
staleinstances(tinf::InferenceTimingNode)

Return a list of InferenceTimingNodes corresponding to MethodInstances that have "stale" code (specifically, CodeInstances with outdated max_world world ages). These may be a hint that invalidation occurred while running the workload provided to @snoop_inference, and consequently an important origin of (re)inference.

Warning

staleinstances only looks retrospectively for stale code; it does not distinguish whether the code became stale while running @snoop_inference from whether it was already stale before execution commenced.

While staleinstances is recommended as a useful "sanity check" to run before performing a detailed analysis of inference, any serious examination of invalidation should use @snoop_invalidations.

For more information about world age, see https://docs.julialang.org/en/v1/manual/methods/#Redefining-Methods.

source
SnoopCompile.inference_triggersFunction
itrigs = inference_triggers(tinf::InferenceTimingNode; exclude_toplevel=true)

Collect the "triggers" of inference, each a fresh entry into inference via a call dispatched at runtime. All the entries in itrigs are previously uninferred, or are freshly-inferred for specific constant inputs.

exclude_toplevel determines whether calls made from the REPL, include, or test suites are excluded.

Example

We'll use SnoopCompile.itrigs_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.itrigs_demo()
+    calldouble1 at /pathto/SnoopCompile/src/parcel_snoop_inference.jl:762 (2 callees from 1 callers)
source
SnoopCompile.collect_forFunction
list = collect_for(m::Method, tinf::InferenceTimingNode)
+list = collect_for(m::MethodInstance, tinf::InferenceTimingNode)

Collect all InferenceTimingNodes (descendants of tinf) that match m.

source
SnoopCompile.staleinstancesFunction
staleinstances(tinf::InferenceTimingNode)

Return a list of InferenceTimingNodes corresponding to MethodInstances that have "stale" code (specifically, CodeInstances with outdated max_world world ages). These may be a hint that invalidation occurred while running the workload provided to @snoop_inference, and consequently an important origin of (re)inference.

Warning

staleinstances only looks retrospectively for stale code; it does not distinguish whether the code became stale while running @snoop_inference from whether it was already stale before execution commenced.

While staleinstances is recommended as a useful "sanity check" to run before performing a detailed analysis of inference, any serious examination of invalidation should use @snoop_invalidations.

For more information about world age, see https://docs.julialang.org/en/v1/manual/methods/#Redefining-Methods.

source
SnoopCompile.inference_triggersFunction
itrigs = inference_triggers(tinf::InferenceTimingNode; exclude_toplevel=true)

Collect the "triggers" of inference, each a fresh entry into inference via a call dispatched at runtime. All the entries in itrigs are previously uninferred, or are freshly-inferred for specific constant inputs.

exclude_toplevel determines whether calls made from the REPL, include, or test suites are excluded.

Example

We'll use SnoopCompile.itrigs_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.itrigs_demo()
 InferenceTimingNode: 0.004490576/0.004711168 on Core.Compiler.Timings.ROOT() with 2 direct children
 
 julia> itrigs = inference_triggers(tinf)
@@ -136,23 +136,23 @@
  >   double(::Float64)
        calldouble1 at /pathto/SnoopCompile/src/inference_demos.jl:86 => calldouble2(::Vector{Vector{Any}}) at /pathto/SnoopCompile/src/inference_demos.jl:87
          calleach(::Vector{Vector{Vector{Any}}}) at /pathto/SnoopCompile/src/inference_demos.jl:88
-...
source
SnoopCompile.trigger_treeFunction
root = trigger_tree(itrigs)

Organize inference triggers itrigs in tree format, grouping items via the call tree.

It is a tree rather than a more general graph due to the fact that caching inference results means that each node gets visited only once.

source
SnoopCompile.suggestFunction
suggest(itrig::InferenceTrigger)

Analyze itrig and attempt to suggest an interpretation or remedy. This returns a structure of type Suggested; the easiest thing to do with the result is to show it; however, you can also filter a list of suggestions.

Example

julia> itrigs = inference_triggers(tinf);
+...
source
SnoopCompile.trigger_treeFunction
root = trigger_tree(itrigs)

Organize inference triggers itrigs in tree format, grouping items via the call tree.

It is a tree rather than a more general graph due to the fact that caching inference results means that each node gets visited only once.

source
SnoopCompile.suggestFunction
suggest(itrig::InferenceTrigger)

Analyze itrig and attempt to suggest an interpretation or remedy. This returns a structure of type Suggested; the easiest thing to do with the result is to show it; however, you can also filter a list of suggestions.

Example

julia> itrigs = inference_triggers(tinf);
 
 julia> sugs = suggest.(itrigs);
 
-julia> sugs_important = filter(!isignorable, sugs)    # discard the ones that probably don't need to be addressed
Warning

Suggestions are approximate at best; most often, the proposed fixes should not be taken literally, but instead taken as a hint about the "outcome" of a particular runtime dispatch incident. The suggestions target calls made with non-inferrable argumets, but often the best place to fix the problem is at an earlier stage in the code, where the argument was first computed.

You can get much deeper insight via ascend (and Cthulhu generally), and even stacktrace is often useful. Suggestions are intended to be a quick and easier-to-comprehend first pass at analyzing an inference trigger.

source
SnoopCompile.callerinstanceFunction
mi = callerinstance(itrig::InferenceTrigger)

Return the MethodInstance mi of the caller in the selected stackframe in itrig.

source
SnoopCompile.callingframeFunction
itrigcaller = callingframe(itrig::InferenceTrigger)

"Step out" one layer of the stacktrace, referencing the caller of the current frame of itrig.

You can retrieve the proximal trigger of inference with InferenceTrigger(itrigcaller).

Example

We collect data using the SnoopCompile.itrigs_demo:

julia> itrig = inference_triggers(SnoopCompile.itrigs_demo())[1]
+julia> sugs_important = filter(!isignorable, sugs)    # discard the ones that probably don't need to be addressed
Warning

Suggestions are approximate at best; most often, the proposed fixes should not be taken literally, but instead taken as a hint about the "outcome" of a particular runtime dispatch incident. The suggestions target calls made with non-inferrable argumets, but often the best place to fix the problem is at an earlier stage in the code, where the argument was first computed.

You can get much deeper insight via ascend (and Cthulhu generally), and even stacktrace is often useful. Suggestions are intended to be a quick and easier-to-comprehend first pass at analyzing an inference trigger.

source
SnoopCompile.callerinstanceFunction
mi = callerinstance(itrig::InferenceTrigger)

Return the MethodInstance mi of the caller in the selected stackframe in itrig.

source
SnoopCompile.callingframeFunction
itrigcaller = callingframe(itrig::InferenceTrigger)

"Step out" one layer of the stacktrace, referencing the caller of the current frame of itrig.

You can retrieve the proximal trigger of inference with InferenceTrigger(itrigcaller).

Example

We collect data using the SnoopCompile.itrigs_demo:

julia> itrig = inference_triggers(SnoopCompile.itrigs_demo())[1]
 Inference triggered to call MethodInstance for double(::UInt8) from calldouble1 (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:762) inlined into MethodInstance for calldouble2(::Vector{Vector{Any}}) (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:763)
 
 julia> itrigcaller = callingframe(itrig)
-Inference triggered to call MethodInstance for double(::UInt8) from calleach (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:764) with specialization MethodInstance for calleach(::Vector{Vector{Vector{Any}}})
source
SnoopCompile.skiphigherorderFunction
itrignew = skiphigherorder(itrig; exact::Bool=false)

Attempt to skip over frames of higher-order functions that take the callee as a function-argument. This can be useful if you're analyzing inference triggers for an entire package and would prefer to assign triggers to package-code rather than Base functions like map!, broadcast, etc.

Example

We collect data using the SnoopCompile.itrigs_higherorder_demo:

julia> itrig = inference_triggers(SnoopCompile.itrigs_higherorder_demo())[1]
+Inference triggered to call MethodInstance for double(::UInt8) from calleach (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:764) with specialization MethodInstance for calleach(::Vector{Vector{Vector{Any}}})
source
SnoopCompile.skiphigherorderFunction
itrignew = skiphigherorder(itrig; exact::Bool=false)

Attempt to skip over frames of higher-order functions that take the callee as a function-argument. This can be useful if you're analyzing inference triggers for an entire package and would prefer to assign triggers to package-code rather than Base functions like map!, broadcast, etc.

Example

We collect data using the SnoopCompile.itrigs_higherorder_demo:

julia> itrig = inference_triggers(SnoopCompile.itrigs_higherorder_demo())[1]
 Inference triggered to call MethodInstance for double(::Float64) from mymap! (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:706) with specialization MethodInstance for mymap!(::typeof(SnoopCompile.ItrigHigherOrderDemo.double), ::Vector{Any}, ::Vector{Any})
 
 julia> callingframe(itrig)      # step out one (non-inlined) frame
 Inference triggered to call MethodInstance for double(::Float64) from mymap (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:710) with specialization MethodInstance for mymap(::typeof(SnoopCompile.ItrigHigherOrderDemo.double), ::Vector{Any})
 
 julia> skiphigherorder(itrig)   # step out to frame that doesn't have `double` as a function-argument
-Inference triggered to call MethodInstance for double(::Float64) from callmymap (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:711) with specialization MethodInstance for callmymap(::Vector{Any})
Warn

By default skiphigherorder is conservative, and insists on being sure that it's the callee being passed to the higher-order function. Higher-order functions that do not get specialized (e.g., with ::Function argument types) will not be skipped over. You can pass exact=false to allow ::Function to also be passed over, but keep in mind that this may falsely skip some frames.

source
SnoopCompile.InferenceTriggerType
InferenceTrigger(callee::MethodInstance, callerframes::Vector{StackFrame}, btidx::Int, bt)

Organize information about the "triggers" of inference. callee is the MethodInstance requiring inference, callerframes, btidx and bt contain information about the caller. callerframes are the frame(s) of call site that triggered inference; it's a Vector{StackFrame}, rather than a single StackFrame, due to the possibility that the caller was inlined into something else, in which case the first entry is the direct caller and the last entry corresponds to the MethodInstance into which it was ultimately inlined. btidx is the index in bt, the backtrace collected upon entry into inference, corresponding to callerframes.

InferenceTriggers are created by calling inference_triggers. See also: callerinstance and callingframe.

source
SnoopCompile.runtime_inferencetimeFunction
ridata = runtime_inferencetime(tinf::InferenceTimingNode; consts=true, by=inclusive)
-ridata = runtime_inferencetime(tinf::InferenceTimingNode, profiledata; lidict, consts=true, by=inclusive)

Compare runtime and inference-time on a per-method basis. ridata[m::Method] returns (trun, tinfer, nspecializations), measuring the approximate amount of time spent running m, inferring m, and the number of type-specializations, respectively. trun is estimated from profiling data, which the user is responsible for capturing before the call. Typically tinf is collected via @snoop_inference on the first call (in a fresh session) to a workload, and the profiling data collected on a subsequent call. In some cases you may need to repeat the workload several times to collect enough profiling samples.

profiledata and lidict are obtained from Profile.retrieve().

source
SnoopCompile.parcelFunction
ttot, pcs = SnoopCompile.parcel(tinf::InferenceTimingNode)

Parcel the "root-most" precompilable MethodInstances into separate modules. These can be used to generate precompile directives to cache the results of type-inference, reducing latency on first use.

Loosely speaking, and MethodInstance is precompilable if the module that owns the method also has access to all the types it need to precompile the instance. When the root node of an entrance to inference is not itself precompilable, parcel examines the children (and possibly, children's children...) until it finds the first node on each branch that is precompilable. MethodInstances are then assigned to the module that owns the method.

ttot is the total inference time; pcs is a list of module => (tmod, pclist) pairs. For each module, tmod is the amount of inference time affiliated with methods owned by that module; pclist is a list of (t, mi) time/MethodInstance tuples.

See also: SnoopCompile.write.

Example

We'll use SnoopCompile.itrigs_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.itrigs_demo()
+Inference triggered to call MethodInstance for double(::Float64) from callmymap (/pathto/SnoopCompile/src/parcel_snoop_inference.jl:711) with specialization MethodInstance for callmymap(::Vector{Any})
Warn

By default skiphigherorder is conservative, and insists on being sure that it's the callee being passed to the higher-order function. Higher-order functions that do not get specialized (e.g., with ::Function argument types) will not be skipped over. You can pass exact=false to allow ::Function to also be passed over, but keep in mind that this may falsely skip some frames.

source
SnoopCompile.InferenceTriggerType
InferenceTrigger(callee::MethodInstance, callerframes::Vector{StackFrame}, btidx::Int, bt)

Organize information about the "triggers" of inference. callee is the MethodInstance requiring inference, callerframes, btidx and bt contain information about the caller. callerframes are the frame(s) of call site that triggered inference; it's a Vector{StackFrame}, rather than a single StackFrame, due to the possibility that the caller was inlined into something else, in which case the first entry is the direct caller and the last entry corresponds to the MethodInstance into which it was ultimately inlined. btidx is the index in bt, the backtrace collected upon entry into inference, corresponding to callerframes.

InferenceTriggers are created by calling inference_triggers. See also: callerinstance and callingframe.

source
SnoopCompile.runtime_inferencetimeFunction
ridata = runtime_inferencetime(tinf::InferenceTimingNode; consts=true, by=inclusive)
+ridata = runtime_inferencetime(tinf::InferenceTimingNode, profiledata; lidict, consts=true, by=inclusive)

Compare runtime and inference-time on a per-method basis. ridata[m::Method] returns (trun, tinfer, nspecializations), measuring the approximate amount of time spent running m, inferring m, and the number of type-specializations, respectively. trun is estimated from profiling data, which the user is responsible for capturing before the call. Typically tinf is collected via @snoop_inference on the first call (in a fresh session) to a workload, and the profiling data collected on a subsequent call. In some cases you may need to repeat the workload several times to collect enough profiling samples.

profiledata and lidict are obtained from Profile.retrieve().

source
SnoopCompile.parcelFunction
ttot, pcs = SnoopCompile.parcel(tinf::InferenceTimingNode)

Parcel the "root-most" precompilable MethodInstances into separate modules. These can be used to generate precompile directives to cache the results of type-inference, reducing latency on first use.

Loosely speaking, and MethodInstance is precompilable if the module that owns the method also has access to all the types it need to precompile the instance. When the root node of an entrance to inference is not itself precompilable, parcel examines the children (and possibly, children's children...) until it finds the first node on each branch that is precompilable. MethodInstances are then assigned to the module that owns the method.

ttot is the total inference time; pcs is a list of module => (tmod, pclist) pairs. For each module, tmod is the amount of inference time affiliated with methods owned by that module; pclist is a list of (t, mi) time/MethodInstance tuples.

See also: SnoopCompile.write.

Example

We'll use SnoopCompile.itrigs_demo, which runs @snoop_inference on a workload designed to yield reproducible results:

julia> tinf = SnoopCompile.itrigs_demo()
 InferenceTimingNode: 0.004490576/0.004711168 on Core.Compiler.Timings.ROOT() with 2 direct children
 
 julia> ttot, pcs = SnoopCompile.parcel(tinf);
@@ -162,7 +162,7 @@
 
 julia> pcs
 1-element Vector{Pair{Module, Tuple{Float64, Vector{Tuple{Float64, Core.MethodInstance}}}}}:
- SnoopCompile.ItrigDemo => (0.000220592, [(9.8986e-5, MethodInstance for double(::Float64)), (0.000121606, MethodInstance for double(::UInt8))])

Since there was only one module, ttot is the same as tmod. The ItrigDemo module had two precomilable MethodInstances, each listed with its corresponding inclusive time.

source
modtrigs = SnoopCompile.parcel(mtrigs::AbstractVector{MethodTriggers})

Split method-based triggers into collections organized by the module in which the methods were defined. Returns a module => list vector, with the module having the most MethodTriggers last.

source
SnoopCompile.writeFunction
write(prefix::AbstractString, pc::Dict; always::Bool = false)

Write each modules' precompiles to a separate file. If always is true, the generated function will always run the precompile statements when called, otherwise the statements will only be called during package precompilation.

source

Analysis of LLVM

SnoopCompile.read_snoop_llvmFunction
times, info = SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml"; tmin_secs=0.0)

Reads the log file produced by the compiler and returns the structured representations.

The results will only contain modules that took longer than tmin_secs to optimize.

Return value

  • times contains the time spent optimizing each module, as a Pair from the time to an

array of Strings, one for every MethodInstance in that llvm module.

  • info is a Dict containing statistics for each MethodInstance encountered, from before

and after optimization, including number of instructions and number of basicblocks.

Example

julia> @snoop_llvm "func_names.csv" "llvm_timings.yaml" begin
+ SnoopCompile.ItrigDemo => (0.000220592, [(9.8986e-5, MethodInstance for double(::Float64)), (0.000121606, MethodInstance for double(::UInt8))])

Since there was only one module, ttot is the same as tmod. The ItrigDemo module had two precomilable MethodInstances, each listed with its corresponding inclusive time.

source
modtrigs = SnoopCompile.parcel(mtrigs::AbstractVector{MethodTriggers})

Split method-based triggers into collections organized by the module in which the methods were defined. Returns a module => list vector, with the module having the most MethodTriggers last.

source
SnoopCompile.writeFunction
write(prefix::AbstractString, pc::Dict; always::Bool = false)

Write each modules' precompiles to a separate file. If always is true, the generated function will always run the precompile statements when called, otherwise the statements will only be called during package precompilation.

source

Analysis of LLVM

SnoopCompile.read_snoop_llvmFunction
times, info = SnoopCompile.read_snoop_llvm("func_names.csv", "llvm_timings.yaml"; tmin_secs=0.0)

Reads the log file produced by the compiler and returns the structured representations.

The results will only contain modules that took longer than tmin_secs to optimize.

Return value

  • times contains the time spent optimizing each module, as a Pair from the time to an

array of Strings, one for every MethodInstance in that llvm module.

  • info is a Dict containing statistics for each MethodInstance encountered, from before

and after optimization, including number of instructions and number of basicblocks.

Example

julia> @snoop_llvm "func_names.csv" "llvm_timings.yaml" begin
            using InteractiveUtils
            @eval InteractiveUtils.peakflops()
        end
@@ -181,7 +181,7 @@
 Dict{String, NamedTuple{(:before, :after), Tuple{NamedTuple{(:instructions, :basicblocks), Tuple{Int64, Int64}}, NamedTuple{(:instructions, :basicblocks), Tuple{Int64, Int64}}}}} with 3 entries:
   "Tuple{typeof(LinearAlgebra.copy_transpose!), Ar… => (before = (instructions = 651, basicblocks = 83), after = (instructions = 348, basicblocks = 40…
   "Tuple{typeof(Base.copyto!), Array{Float64, 2}, … => (before = (instructions = 617, basicblocks = 77), after = (instructions = 397, basicblocks = 37…
-  "Tuple{typeof(LinearAlgebra._generic_matmatmul!)… => (before = (instructions = 4796, basicblocks = 824), after = (instructions = 1421, basicblocks =…
source

Demos

SnoopCompile.flatten_demoFunction
tinf = SnoopCompile.flatten_demo()

A simple demonstration of @snoop_inference. This demo defines a module

module FlattenDemo
+  "Tuple{typeof(LinearAlgebra._generic_matmatmul!)… => (before = (instructions = 4796, basicblocks = 824), after = (instructions = 1421, basicblocks =…
source

Demos

SnoopCompile.flatten_demoFunction
tinf = SnoopCompile.flatten_demo()

A simple demonstration of @snoop_inference. This demo defines a module

module FlattenDemo
     struct MyType{T} x::T end
     extract(y::MyType) = y.x
     function packintype(x)
@@ -193,14 +193,14 @@
         return y*x + 2*x + 5
     end
     dostuff(y) = domath(extract(y))
-end

It then "warms up" (forces inference on) all of Julia's Base methods needed for domath, to ensure that these MethodInstances do not need to be inferred when we collect the data. It then returns the results of

@snoop_inference FlattenDemo.packintypes(1)

See flatten for an example usage.

source
SnoopCompile.itrigs_demoFunction
tinf = SnoopCompile.itrigs_demo()

A simple demonstration of collecting inference triggers. This demo defines a module

module ItrigDemo
+end

It then "warms up" (forces inference on) all of Julia's Base methods needed for domath, to ensure that these MethodInstances do not need to be inferred when we collect the data. It then returns the results of

@snoop_inference FlattenDemo.packintypes(1)

See flatten for an example usage.

source
SnoopCompile.itrigs_demoFunction
tinf = SnoopCompile.itrigs_demo()

A simple demonstration of collecting inference triggers. This demo defines a module

module ItrigDemo
 @noinline double(x) = 2x
 @inline calldouble1(c) = double(c[1])
 calldouble2(cc) = calldouble1(cc[1])
 calleach(ccs) = (calldouble2(ccs[1]), calldouble2(ccs[2]))
 end

It then "warms up" (forces inference on) calldouble2(::Vector{Vector{Any}}), calldouble1(::Vector{Any}), double(::Int):

cc = [Any[1]]
 ItrigDemo.calleach([cc,cc])

Then it collects and returns inference data using

cc1, cc2 = [Any[0x01]], [Any[1.0]]
-@snoop_inference ItrigDemo.calleach([cc1, cc2])

This does not require any new inference for calldouble2 or calldouble1, but it does force inference on double with two new types. See inference_triggers to see what gets collected and returned.

source
SnoopCompile.itrigs_higherorder_demoFunction
tinf = SnoopCompile.itrigs_higherorder_demo()

A simple demonstration of handling higher-order methods with inference triggers. This demo defines a module

module ItrigHigherOrderDemo
+@snoop_inference ItrigDemo.calleach([cc1, cc2])

This does not require any new inference for calldouble2 or calldouble1, but it does force inference on double with two new types. See inference_triggers to see what gets collected and returned.

source
SnoopCompile.itrigs_higherorder_demoFunction
tinf = SnoopCompile.itrigs_higherorder_demo()

A simple demonstration of handling higher-order methods with inference triggers. This demo defines a module

module ItrigHigherOrderDemo
 double(x) = 2x
 @noinline function mymap!(f, dst, src)
     for i in eachindex(dst, src)
@@ -210,4 +210,4 @@
 end
 @noinline mymap(f::F, src) where F = mymap!(f, Vector{Any}(undef, length(src)), src)
 callmymap(src) = mymap(double, src)
-end

The key feature of this set of definitions is that the function double gets passed as an argument through mymap and mymap! (the latter are higher-order functions).

It then "warms up" (forces inference on) callmymap(::Vector{Any}), mymap(::typeof(double), ::Vector{Any}), mymap!(::typeof(double), ::Vector{Any}, ::Vector{Any}) and double(::Int):

ItrigHigherOrderDemo.callmymap(Any[1, 2])

Then it collects and returns inference data using

@snoop_inference ItrigHigherOrderDemo.callmymap(Any[1.0, 2.0])

which forces inference for double(::Float64).

See skiphigherorder for an example using this demo.

source
+end

The key feature of this set of definitions is that the function double gets passed as an argument through mymap and mymap! (the latter are higher-order functions).

It then "warms up" (forces inference on) callmymap(::Vector{Any}), mymap(::typeof(double), ::Vector{Any}), mymap!(::typeof(double), ::Vector{Any}, ::Vector{Any}) and double(::Int):

ItrigHigherOrderDemo.callmymap(Any[1, 2])

Then it collects and returns inference data using

@snoop_inference ItrigHigherOrderDemo.callmymap(Any[1.0, 2.0])

which forces inference for double(::Float64).

See skiphigherorder for an example using this demo.

source
diff --git a/dev/tutorials/Blackjack/Project.toml b/dev/tutorials/Blackjack/Project.toml index e3001521..11af44b2 100644 --- a/dev/tutorials/Blackjack/Project.toml +++ b/dev/tutorials/Blackjack/Project.toml @@ -1,5 +1,5 @@ name = "Blackjack" -uuid = "e7ee956e-63e0-4d99-9a47-f65ca589ce8f" +uuid = "495d6eb2-adb7-44b3-9b04-06edd18ef5f7" authors = ["runner "] version = "0.1.0" diff --git a/dev/tutorials/BlackjackFacecards/Manifest.toml b/dev/tutorials/BlackjackFacecards/Manifest.toml index d37b87b6..aff1c695 100644 --- a/dev/tutorials/BlackjackFacecards/Manifest.toml +++ b/dev/tutorials/BlackjackFacecards/Manifest.toml @@ -2,12 +2,12 @@ julia_version = "1.11.1" manifest_format = "2.0" -project_hash = "de30f70b9111ad2b362f1c1929ffb2b4cf85988d" +project_hash = "e15908031510810c11414b9ad88781149e1fd1ef" [[deps.Blackjack]] deps = ["PrecompileTools"] path = "/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/Blackjack" -uuid = "e7ee956e-63e0-4d99-9a47-f65ca589ce8f" +uuid = "495d6eb2-adb7-44b3-9b04-06edd18ef5f7" version = "0.1.0" [[deps.Dates]] diff --git a/dev/tutorials/BlackjackFacecards/Project.toml b/dev/tutorials/BlackjackFacecards/Project.toml index 96489bf6..15f1bdfb 100644 --- a/dev/tutorials/BlackjackFacecards/Project.toml +++ b/dev/tutorials/BlackjackFacecards/Project.toml @@ -1,7 +1,7 @@ name = "BlackjackFacecards" -uuid = "6b9dbbf6-5610-491c-9b69-07854eb7c95f" +uuid = "7483e28d-947b-4029-9dd0-e6b6208f7528" authors = ["runner "] version = "0.1.0" [deps] -Blackjack = "e7ee956e-63e0-4d99-9a47-f65ca589ce8f" +Blackjack = "495d6eb2-adb7-44b3-9b04-06edd18ef5f7" diff --git a/dev/tutorials/invalidations/index.html b/dev/tutorials/invalidations/index.html index d395ada8..3d3e3b47 100644 --- a/dev/tutorials/invalidations/index.html +++ b/dev/tutorials/invalidations/index.html @@ -15,14 +15,14 @@ [fa267f1f] + TOML v1.0.3 [4ec0a83e] + Unicode v1.11.0 Precompiling project... - 521.8 ms ✓ Blackjack + 591.6 ms ✓ Blackjack 1 dependency successfully precompiled in 1 seconds. 6 already precompiled.
julia> Pkg.generate("BlackjackFacecards"); Generating project BlackjackFacecards: BlackjackFacecards/Project.toml BlackjackFacecards/src/BlackjackFacecards.jl
julia> Pkg.activate("BlackjackFacecards") Activating project at `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/BlackjackFacecards`
julia> Pkg.develop(PackageSpec(path=joinpath(pwd(), "Blackjack"))); Resolving package versions... Updating `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/BlackjackFacecards/Project.toml` - [e7ee956e] + Blackjack v0.1.0 `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/Blackjack` + [495d6eb2] + Blackjack v0.1.0 `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/Blackjack` Updating `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/BlackjackFacecards/Manifest.toml` - [e7ee956e] + Blackjack v0.1.0 `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/Blackjack` + [495d6eb2] + Blackjack v0.1.0 `~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/Blackjack` [aea7be01] + PrecompileTools v1.2.1 [21216c6a] + Preferences v1.4.3 [ade2ca70] + Dates v1.11.0 @@ -79,10 +79,10 @@ end """)214
Warning

Because BlackjackFacecards "owns" neither Char nor score, this is piracy and should generally be avoided. Piracy is one way to cause invalidations, but it's not the only one. BlackjackFacecards could avoid committing piracy by defining a struct Facecard ... end and defining score(card::Facecard) instead of score(card::Char). However, this would not fix the invalidations–all the factors described below are unchanged.

Now we're ready!

Recording invalidations

Here are the steps executed by the code below

julia> using SnoopCompileCore
julia> invs = @snoop_invalidations using Blackjack, BlackjackFacecards;Precompiling Blackjack... - 635.8 msBlackjack + 647.1 msBlackjack 1 dependency successfully precompiled in 1 seconds. 6 already precompiled. Precompiling BlackjackFacecards... - 473.8 msBlackjackFacecards + 466.3 msBlackjackFacecards 1 dependency successfully precompiled in 0 seconds. 7 already precompiled.
julia> using SnoopCompile, AbstractTrees
Tip

If you get errors like Package SnoopCompileCore not found in current path, a likely explanation is that you didn't add it to your default environment. In the example above, we're in the BlackjackFacecards environment so we can develop the package, but you also need access to SnoopCompile and SnoopCompileCore. Having these in your default environment lets them be found even if they aren't part of the current environment.

Analyzing invalidations

Now we're ready to see what, if anything, got invalidated:

julia> trees = invalidation_trees(invs)1-element Vector{SnoopCompile.MethodInvalidations}:
  inserting score(card::Char) @ BlackjackFacecards ~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/BlackjackFacecards/src/BlackjackFacecards.jl:6 invalidated:
    mt_backedges: 1: signature Tuple{typeof(Blackjack.score), Any} triggered MethodInstance for Blackjack.tallyscores(::Vector{Any}) (1 children)

This has only one "tree" of invalidations. trees is a Vector so we can index it:

julia> tree = trees[1]inserting score(card::Char) @ BlackjackFacecards ~/work/SnoopCompile.jl/SnoopCompile.jl/docs/build/tutorials/BlackjackFacecards/src/BlackjackFacecards.jl:6 invalidated:
@@ -98,4 +98,4 @@
         s += invokelatest(score, card)
     end
     return s
-end

This forces Julia to always look up the appropriate method of score while the code is running, and thus prevents the speculative optimizations that leave the code vulnerable to invalidation. However, the cost is that your code may run somewhat more slowly, particularly here where the call is inside a loop.

If you plan to define at least two score methods, another way to turn off this optimization would be to declare

Base.Experimental.@max_methods 1 function score end

before defining any score methods. You can read the documentation on @max_methods to learn more about how it works.

Tip

Most of us learn best by doing. Try at least one of these methods of fixing the invalidation, and use SnoopCompile to verify that it works.

Undoing the damage from invalidations

If you can't prevent the invalidation, an alternative approach is to recompile the invalidated code. For example, one could repeat the precompile workload from Blackjack in BlackjackFacecards. While this will mean that the whole "stack" will be compiled twice and cached twice (which is wasteful), it should be effective in reducing latency for users.

PrecompileTools also has a @recompile_invalidations. This isn't generally recommended for use in package (you can end up with long compile times for things you don't need), but it can be useful in personal "Startup packages" where you want to reduce latency for a particular project you're working on. See the PrecompileTools documentation for details.

  Activating project at `~/work/SnoopCompile.jl/SnoopCompile.jl/docs`
+end

This forces Julia to always look up the appropriate method of score while the code is running, and thus prevents the speculative optimizations that leave the code vulnerable to invalidation. However, the cost is that your code may run somewhat more slowly, particularly here where the call is inside a loop.

If you plan to define at least two score methods, another way to turn off this optimization would be to declare

Base.Experimental.@max_methods 1 function score end

before defining any score methods. You can read the documentation on @max_methods to learn more about how it works.

Tip

Most of us learn best by doing. Try at least one of these methods of fixing the invalidation, and use SnoopCompile to verify that it works.

Undoing the damage from invalidations

If you can't prevent the invalidation, an alternative approach is to recompile the invalidated code. For example, one could repeat the precompile workload from Blackjack in BlackjackFacecards. While this will mean that the whole "stack" will be compiled twice and cached twice (which is wasteful), it should be effective in reducing latency for users.

PrecompileTools also has a @recompile_invalidations. This isn't generally recommended for use in package (you can end up with long compile times for things you don't need), but it can be useful in personal "Startup packages" where you want to reduce latency for a particular project you're working on. See the PrecompileTools documentation for details.

  Activating project at `~/work/SnoopCompile.jl/SnoopCompile.jl/docs`
diff --git a/dev/tutorials/jet/index.html b/dev/tutorials/jet/index.html index 22545fc7..66c4c32f 100644 --- a/dev/tutorials/jet/index.html +++ b/dev/tutorials/jet/index.html @@ -78,4 +78,4 @@ │││││││││││││││┌ reduce_empty(::typeof(+), ::Type{Any}) @ Base ./reduce.jl:343 ││││││││││││││││┌ zero(::Type{Any}) @ Base ./missing.jl:106 │││││││││││││││││ MethodError: no method matching zero(::Type{Any}): Base.throw(Base.MethodError(zero, tuple(Base.Any)::Tuple{DataType})::MethodError) -││││││││││││││││└────────────────────

Because SnoopCompileCore collected the runtime-dispatched sum call, we can pass it to JET. report_callees filters those calls which generate JET reports, allowing you to focus on potential errors.

Note

JET integration is enabled only if JET.jl and Cthulhu.jl have been loaded into your main session. This is why there's the using JET, Cthulhu statement included in the example given.

+││││││││││││││││└────────────────────

Because SnoopCompileCore collected the runtime-dispatched sum call, we can pass it to JET. report_callees filters those calls which generate JET reports, allowing you to focus on potential errors.

Note

JET integration is enabled only if JET.jl and Cthulhu.jl have been loaded into your main session. This is why there's the using JET, Cthulhu statement included in the example given.

diff --git a/dev/tutorials/llvm_timings.yaml b/dev/tutorials/llvm_timings.yaml index 05b5eb7c..44047aff 100644 --- a/dev/tutorials/llvm_timings.yaml +++ b/dev/tutorials/llvm_timings.yaml @@ -3,7 +3,7 @@ "julia_NamedTuple_533": instructions: 25 basicblocks: 1 - time_ns: 1014653 + time_ns: 1063097 optlevel: 2 after: "julia_NamedTuple_533": @@ -14,7 +14,7 @@ "julia_peakflops_536": instructions: 31 basicblocks: 1 - time_ns: 806236 + time_ns: 835562 optlevel: 1 after: "julia_peakflops_536": @@ -25,7 +25,7 @@ "julia_#peakflops#81_546": instructions: 62 basicblocks: 1 - time_ns: 690031 + time_ns: 828489 optlevel: 1 after: "julia_#peakflops#81_546": @@ -36,7 +36,7 @@ "julia_Signed_558": instructions: 47 basicblocks: 7 - time_ns: 986050 + time_ns: 1189373 optlevel: 2 after: "julia_Signed_558": @@ -47,7 +47,7 @@ "julia_getproperty_569": instructions: 45 basicblocks: 3 - time_ns: 1039179 + time_ns: 2252851 optlevel: 2 after: "julia_getproperty_569": @@ -55,7 +55,7 @@ basicblocks: 3 - before: - time_ns: 90918 + time_ns: 252883 optlevel: 2 after: - @@ -63,7 +63,7 @@ "julia_mapreduce_impl_651": instructions: 1043 basicblocks: 124 - time_ns: 9162516 + time_ns: 10682147 optlevel: 2 after: "julia_mapreduce_impl_651": @@ -74,7 +74,7 @@ "julia_#peakflops#349_584": instructions: 830 basicblocks: 100 - time_ns: 8791417 + time_ns: 9584756 optlevel: 2 after: "julia_#peakflops#349_584": @@ -85,7 +85,7 @@ "julia_peakflops_579": instructions: 27 basicblocks: 1 - time_ns: 580497 + time_ns: 767144 optlevel: 2 after: "julia_peakflops_579": @@ -96,7 +96,7 @@ "julia_throw_boundserror_681": instructions: 27 basicblocks: 3 - time_ns: 543709 + time_ns: 679120 optlevel: 2 after: "julia_throw_boundserror_681": @@ -107,7 +107,7 @@ "julia_ones_657": instructions: 265 basicblocks: 32 - time_ns: 4106041 + time_ns: 4518516 optlevel: 2 after: "julia_ones_657": @@ -115,7 +115,7 @@ basicblocks: 14 - before: - time_ns: 142685 + time_ns: 144329 optlevel: 2 after: - @@ -123,7 +123,7 @@ "julia_throw_boundserror_815": instructions: 38 basicblocks: 3 - time_ns: 1037235 + time_ns: 1039242 optlevel: 2 after: "julia_throw_boundserror_815": @@ -134,7 +134,7 @@ "julia_matmul2x2!_821": instructions: 10425 basicblocks: 1122 - time_ns: 11045200 + time_ns: 11073849 optlevel: 2 after: "julia_matmul2x2!_821": @@ -145,7 +145,7 @@ "julia_matmul3x3!_806": instructions: 22969 basicblocks: 2452 - time_ns: 19599557 + time_ns: 19759084 optlevel: 2 after: "julia_matmul3x3!_806": @@ -156,7 +156,7 @@ "julia_gemm!_784": instructions: 1069 basicblocks: 148 - time_ns: 5925397 + time_ns: 6166727 optlevel: 2 after: "julia_gemm!_784": @@ -167,7 +167,7 @@ "julia_throw_uplo_781": instructions: 32 basicblocks: 3 - time_ns: 833978 + time_ns: 807199 optlevel: 2 after: "julia_throw_uplo_781": @@ -178,7 +178,7 @@ "julia_wrap_745": instructions: 1322 basicblocks: 99 - time_ns: 10555020 + time_ns: 10924570 optlevel: 2 after: "julia_wrap_745": @@ -189,7 +189,7 @@ "julia_gemm_wrapper!_700": instructions: 1109 basicblocks: 148 - time_ns: 7873083 + time_ns: 8441569 optlevel: 2 after: "julia_gemm_wrapper!_700": @@ -200,7 +200,7 @@ "julia_*_684": instructions: 151 basicblocks: 15 - time_ns: 2084810 + time_ns: 2171879 optlevel: 2 after: "julia_*_684": @@ -211,7 +211,7 @@ "julia_getindex_824": instructions: 189 basicblocks: 18 - time_ns: 1839173 + time_ns: 1880475 optlevel: 2 after: "julia_getindex_824": diff --git a/dev/tutorials/pgdsgui/index.html b/dev/tutorials/pgdsgui/index.html index 9645e980..3418064c 100644 --- a/dev/tutorials/pgdsgui/index.html +++ b/dev/tutorials/pgdsgui/index.html @@ -83,4 +83,4 @@ MethodInstance for save(::String, ::Vector{SomePkg.SomeDataType{SubString{String}}}) MethodInstance for save(::SubString{String}, ::Array) MethodInstance for save(::String, ::Vector{var"#s92"} where var"#s92"<:SomePkg.SomeDataType) - MethodInstance for save(::String, ::Array)

In this case we have 7 MethodInstances (some of which are clearly due to poor inferrability of the caller) when one might suffice.

+ MethodInstance for save(::String, ::Array)

In this case we have 7 MethodInstances (some of which are clearly due to poor inferrability of the caller) when one might suffice.

diff --git a/dev/tutorials/snoop_inference/index.html b/dev/tutorials/snoop_inference/index.html index 11f5ca42..3673f9f1 100644 --- a/dev/tutorials/snoop_inference/index.html +++ b/dev/tutorials/snoop_inference/index.html @@ -45,4 +45,4 @@ MethodInstance for FlattenDemo.packintype(::Int64)

Each node in this tree is accompanied by a pair of numbers. The first number is the exclusive inference time (in seconds), meaning the time spent inferring the particular MethodInstance, not including the time spent inferring its callees. The second number is the inclusive time, which is the exclusive time plus the time spent on the callees. Therefore, the inclusive time is always at least as large as the exclusive time.

The ROOT node is a bit different: its exclusive time measures the time spent on all operations except inference. In this case, we see that the entire call took approximately 3.3ms, of which 2.7ms was spent on activities besides inference. Almost all of that was code-generation, but it also includes the time needed to run the code. Just 0.55ms was needed to run type-inference on this entire series of calls. As you will quickly discover, inference takes much more time on more complicated code.

We can also display this tree as a flame graph, using the ProfileView.jl package:

julia> fg = flamegraph(tinf)
 Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:75, 0x00, 0:10080857))
julia> using ProfileView
 
-julia> ProfileView.view(fg)

You should see something like this:

flamegraph

Users are encouraged to read the ProfileView documentation to understand how to interpret this, but briefly:

You can explore this flamegraph and compare it to the output from print_tree.

Note

Orange-yellow boxes that appear at the base of a flame are worth special attention, and may represent something that you thought you had precompiled. For example, suppose your workload "exercises" myfun(args...; warn=true), so you might think you have myfun covered for the corresponding argument types. But constant-propagation (as indicated by the orange-yellow coloration) results in (re)compilation for specific values: if Julia has decided that myfun merits constant-propagation, a call myfun(args...; warn=false) might need to be compiled separately.

When you want to prevent constant-propagation from hurting your TTFX, you have two options:

  • precompile for all relevant argument values as well as types. The most common argument types to trigger Julia's constprop heuristics are numbers (Bool/Int/etc) and Symbol.
  • Disable constant-propagation for this method by adding Base.@constprop :none in front of your definition of myfun. Constant-propagation can be a big performance boost when it changes how performance-sensitive code is optimized for specific input values, but when this doesn't apply you can safely disable it.

Finally, flatten, on its own or together with accumulate_by_source, allows you to get an sense for the cost of individual MethodInstances or Methods.

The tools here allow you to get an overview of where inference is spending its time. This gives you insight into the major contributors to latency.

+julia> ProfileView.view(fg)

You should see something like this:

flamegraph

Users are encouraged to read the ProfileView documentation to understand how to interpret this, but briefly:

You can explore this flamegraph and compare it to the output from print_tree.

Note

Orange-yellow boxes that appear at the base of a flame are worth special attention, and may represent something that you thought you had precompiled. For example, suppose your workload "exercises" myfun(args...; warn=true), so you might think you have myfun covered for the corresponding argument types. But constant-propagation (as indicated by the orange-yellow coloration) results in (re)compilation for specific values: if Julia has decided that myfun merits constant-propagation, a call myfun(args...; warn=false) might need to be compiled separately.

When you want to prevent constant-propagation from hurting your TTFX, you have two options:

  • precompile for all relevant argument values as well as types. The most common argument types to trigger Julia's constprop heuristics are numbers (Bool/Int/etc) and Symbol.
  • Disable constant-propagation for this method by adding Base.@constprop :none in front of your definition of myfun. Constant-propagation can be a big performance boost when it changes how performance-sensitive code is optimized for specific input values, but when this doesn't apply you can safely disable it.

Finally, flatten, on its own or together with accumulate_by_source, allows you to get an sense for the cost of individual MethodInstances or Methods.

The tools here allow you to get an overview of where inference is spending its time. This gives you insight into the major contributors to latency.

diff --git a/dev/tutorials/snoop_inference_analysis/index.html b/dev/tutorials/snoop_inference_analysis/index.html index a566d9a3..6272dccd 100644 --- a/dev/tutorials/snoop_inference_analysis/index.html +++ b/dev/tutorials/snoop_inference_analysis/index.html @@ -1,5 +1,5 @@ -Using @snoop_inference results to improve inferrability · SnoopCompile

Using @snoop_inference results to improve inferrability

Throughout this page, we'll use the OptimizeMe demo, which ships with SnoopCompile.

Note

To understand what follows, it's essential to refer to OptimizeMe source code as you follow along.

julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();lotsa containers:
julia> fg = flamegraph(tinf)Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:79, 0x00, 0:1346230948))

If you visualize fg with ProfileView, you may see something like this:

flamegraph-OptimizeMe

From the standpoint of precompilation, this has some obvious problems:

  • even though we called a single method, OptimizeMe.main(), there are many distinct flames separated by blank spaces. This indicates that many calls are being made by runtime dispatch: each separate flame is a fresh entrance into inference.
  • several of the flames are marked in red, indicating that they are not naively precompilable (see the Tutorial on @snoop_inference). While @compile_workload can handle these flames, an even more robust solution is to eliminate them altogether.

Our goal will be to improve the design of OptimizeMe to make it more readily precompilable.

Analyzing inference triggers

We'll first extract the "triggers" of inference, which is just a repackaging of part of the information contained within tinf. Specifically an InferenceTrigger captures callee/caller relationships that straddle a fresh entrance to type-inference, allowing you to identify which calls were made by runtime dispatch and what MethodInstance they called.

julia> itrigs = inference_triggers(tinf)37-element Vector{InferenceTrigger}:
+Using @snoop_inference results to improve inferrability · SnoopCompile

Using @snoop_inference results to improve inferrability

Throughout this page, we'll use the OptimizeMe demo, which ships with SnoopCompile.

Note

To understand what follows, it's essential to refer to OptimizeMe source code as you follow along.

julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();lotsa containers:
julia> fg = flamegraph(tinf)Node(FlameGraphs.NodeData(ROOT() at typeinfer.jl:79, 0x00, 0:1401678903))

If you visualize fg with ProfileView, you may see something like this:

flamegraph-OptimizeMe

From the standpoint of precompilation, this has some obvious problems:

  • even though we called a single method, OptimizeMe.main(), there are many distinct flames separated by blank spaces. This indicates that many calls are being made by runtime dispatch: each separate flame is a fresh entrance into inference.
  • several of the flames are marked in red, indicating that they are not naively precompilable (see the Tutorial on @snoop_inference). While @compile_workload can handle these flames, an even more robust solution is to eliminate them altogether.

Our goal will be to improve the design of OptimizeMe to make it more readily precompilable.

Analyzing inference triggers

We'll first extract the "triggers" of inference, which is just a repackaging of part of the information contained within tinf. Specifically an InferenceTrigger captures callee/caller relationships that straddle a fresh entrance to type-inference, allowing you to identify which calls were made by runtime dispatch and what MethodInstance they called.

julia> itrigs = inference_triggers(tinf)37-element Vector{InferenceTrigger}:
  Inference triggered to call Main.var"Main".OptimizeMe.main() from eval (./boot.jl:430) inlined into cd(::Documenter.var"#64#66"{Module}, ::String) (./file.jl:112)
  Inference triggered to call similar(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Type{Main.var"Main".OptimizeMe.Container}, Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, ::Type{Main.var"Main".OptimizeMe.Container{Int64}}) from copy (./broadcast.jl:907) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15)
  Inference triggered to call setindex!(::Vector{Main.var"Main".OptimizeMe.Container{Int64}}, ::Main.var"Main".OptimizeMe.Container{Int64}, ::Int64) from copy (./broadcast.jl:908) inlined into Main.var"Main".OptimizeMe.lotsa_containers() (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:15)
@@ -21,11 +21,11 @@
  Inference triggered to call Main.var"Main".OptimizeMe.howbig(::Float64) from #1 (/home/runner/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29) with specialization (::Main.var"Main".OptimizeMe.var"#1#2")(::Float64)
  Inference triggered to call Base.collect_to_with_first!(::Vector{Float64}, ::Float64, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Int64) from _collect (./array.jl:810) with specialization Base._collect(::Vector{Float64}, ::Base.Generator{Vector{Float64}, Main.var"Main".OptimizeMe.var"#1#2"}, ::Base.EltypeUnknown, ::Base.HasShape{1})

The number of elements in this Vector{InferenceTrigger} tells you how many calls were (1) made by runtime dispatch and (2) the callee had not previously been inferred.

Tip

In the REPL, SnoopCompile displays InferenceTriggers with yellow coloration for the callee, red for the caller method, and blue for the caller specialization. This makes it easier to quickly identify the most important information.

In some cases, this might indicate that you'll need to fix each case separately; fortunately, in many cases fixing one problem addresses many other.

Method triggers

Most often, it's most convenient to organize them by the method triggering the need for inference:

julia> mtrigs = accumulate_by_source(Method, itrigs)11-element Vector{SnoopCompile.TaggedTriggers{Method}}:
  cd(f::Function, dir::AbstractString) @ Base.Filesystem file.jl:107 (1 callees from 1 callers)
+ (::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers)
  print_matrix_row(io::IO, X::AbstractVecOrMat, A::Vector, i::Integer, cols::AbstractVector, sep::AbstractString, idxlast::Integer) @ Base arrayshow.jl:97 (1 callees from 1 callers)
  display(d::TextDisplay, M::MIME{Symbol("text/plain")}, x) @ Base.Multimedia multimedia.jl:254 (1 callees from 1 callers)
- (::Main.var"Main".OptimizeMe.var"#1#2")(x) @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:29 (1 callees from 1 callers)
- _collect(c, itr, ::Base.EltypeUnknown, isz::Union{Base.HasLength, Base.HasShape}) @ Base array.jl:797 (1 callees from 1 callers)
  typeinfo_prefix(io::IO, X) @ Base arrayshow.jl:562 (1 callees from 1 callers)
+ _collect(c, itr, ::Base.EltypeUnknown, isz::Union{Base.HasLength, Base.HasShape}) @ Base array.jl:797 (1 callees from 1 callers)
  copyto_nonleaf!(dest, bc::Base.Broadcast.Broadcasted, iter, state, count) @ Base.Broadcast broadcast.jl:1071 (2 callees from 1 callers)
  lotsa_containers() @ Main.var"Main".OptimizeMe ~/work/SnoopCompile.jl/SnoopCompile.jl/examples/OptimizeMe.jl:13 (3 callees from 1 callers)
  var"#sprint#592"(context, sizehint::Integer, ::typeof(sprint), f::Function, args...) @ Base strings/io.jl:107 (8 callees from 2 callers)
@@ -110,4 +110,4 @@
 222
 
 julia> length(itrigsel)
-71

While there is some risk of discarding triggers that provide clues about the origin of other triggers (e.g., they would have shown up in the same branch of the trigger_tree), the shorter list may help direct your attention to the "real" issues.

+71

While there is some risk of discarding triggers that provide clues about the origin of other triggers (e.g., they would have shown up in the same branch of the trigger_tree), the shorter list may help direct your attention to the "real" issues.

diff --git a/dev/tutorials/snoop_inference_parcel/index.html b/dev/tutorials/snoop_inference_parcel/index.html index ddad3fa7..cdc31461 100644 --- a/dev/tutorials/snoop_inference_parcel/index.html +++ b/dev/tutorials/snoop_inference_parcel/index.html @@ -1,14 +1,14 @@ -Using @snoop_inference to emit manual precompile directives · SnoopCompile

Using @snoop_inference to emit manual precompile directives

In a few cases, it may be inconvenient or impossible to precompile using a workload. Some examples might be:

  • an application that opens graphical windows
  • an application that connects to a database
  • an application that creates, deletes, or rewrites files on disk

In such cases, one alternative is to create a manual list of precompile directives using Julia's precompile(f, argtypes) function.

Warning

Manual precompile directives are much more likely to "go stale" as the package is developed–-precompile does not throw an error if a method for the given argtypes cannot be found. They are also more likely to be dependent on the Julia version, operating system, or CPU architecture. Whenever possible, it's safer to use a workload.

precompile directives have to be emitted by the module that owns the method and/or types. SnoopCompile comes with a tool, parcel, that splits out the "root-most" precompilable MethodInstances into their constituent modules. This will typically correspond to the bottom row of boxes in the flame graph. In cases where you have some that are not naively precompilable, they will include MethodInstances from higher up in the call tree.

Let's use SnoopCompile.parcel on our OptimizeMe demo:

julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();lotsa containers:
julia> ttot, pcs = SnoopCompile.parcel(tinf);
julia> ttot0.064739833
julia> pcs4-element Vector{Pair{Module, Tuple{Float64, Vector{Tuple{Float64, Core.MethodInstance}}}}}: - Core => (1.913e-6, [(1.913e-6, MethodInstance for (NamedTuple{(:sizehint,)})(::Tuple{Int64}))]) - Base.Multimedia => (5.54e-6, [(5.54e-6, MethodInstance for MIME(::String))]) - Base => (0.003207727, [(1.733e-6, MethodInstance for LinearIndices(::Tuple{Base.OneTo{Int64}})), (1.773e-6, MethodInstance for IOContext(::IOBuffer, ::IOContext{Base.PipeEndpoint})), (3.837e-6, MethodInstance for IOContext(::IOContext{Base.PipeEndpoint}, ::Base.ImmutableDict{Symbol, Any})), (5.721e-6, MethodInstance for Base.indexed_iterate(::Tuple{String, Bool}, ::Int64, ::Int64)), (5.981e-6, MethodInstance for Base.indexed_iterate(::Tuple{Int64, Int64}, ::Int64, ::Int64)), (6.131e-6, MethodInstance for Base.indexed_iterate(::Pair{Symbol, Any}, ::Int64, ::Int64)), (6.231e-6, MethodInstance for Base.indexed_iterate(::Tuple{Any, Int64}, ::Int64, ::Int64)), (7.173e-6, MethodInstance for getindex(::Tuple{Int64, Int64}, ::Int64)), (7.554e-6, MethodInstance for getindex(::Tuple{Base.OneTo{Int64}}, ::Int64)), (8.967e-6, MethodInstance for Base.indexed_iterate(::Pair{Symbol, Any}, ::Int64)) … (2.8372e-5, MethodInstance for getproperty(::UnionAll, ::Symbol)), (2.9465e-5, MethodInstance for getproperty(::BitVector, ::Symbol)), (3.0447e-5, MethodInstance for getproperty(::Vector, ::Symbol)), (9.7421e-5, MethodInstance for LinearIndices(::Vector{Float64})), (0.000250953, MethodInstance for haskey(::IOContext{Base.PipeEndpoint}, ::Symbol)), (0.00025398, MethodInstance for print(::IOContext{Base.PipeEndpoint}, ::Char)), (0.000281179, MethodInstance for print(::IOContext{Base.PipeEndpoint}, ::String)), (0.000329814, MethodInstance for get(::IOContext{Base.PipeEndpoint}, ::Symbol, ::Type{Any})), (0.000407225, MethodInstance for get(::IOContext{Base.PipeEndpoint}, ::Symbol, ::Bool)), (0.0012911670000000002, MethodInstance for string(::String, ::Int64, ::String))]) - Main.var"Main".OptimizeMe => (0.02279055, [(8.3034e-5, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)), (0.022707516, MethodInstance for Main.var"Main".OptimizeMe.main())])

ttot shows the total amount of time spent on type-inference. parcel discovered precompilable MethodInstances for four modules, Core, Base.Multimedia, Base, and OptimizeMe that might benefit from precompile directives. These are listed in increasing order of inference time.

Let's look specifically at OptimizeMeFixed, since that's under our control:

julia> pcmod = pcs[end]Main.var"Main".OptimizeMe => (0.02279055, Tuple{Float64, Core.MethodInstance}[(8.3034e-5, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)), (0.022707516, MethodInstance for Main.var"Main".OptimizeMe.main())])
julia> tmod, tpcs = pcmod.second;
julia> tmod0.02279055
julia> tpcs2-element Vector{Tuple{Float64, Core.MethodInstance}}: - (8.3034e-5, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)) - (0.022707516, MethodInstance for Main.var"Main".OptimizeMe.main())

This indicates the amount of time spent specifically on OptimizeMe, plus the list of calls that could be precompiled in that module.

We could look at the other modules (packages) similarly.

SnoopCompile.write

You can generate files that contain ready-to-use precompile directives using SnoopCompile.write:

julia> SnoopCompile.write("/tmp/precompiles_OptimizeMe", pcs)Core: no precompile statements out of 1.913e-6
-Base.Multimedia: no precompile statements out of 5.54e-6
-Base: precompiled 0.0012911670000000002 out of 0.003207727
-Main.var"Main".OptimizeMe: precompiled 0.022707516 out of 0.02279055

You'll now find a directory /tmp/precompiles_OptimizeMe, and inside you'll find files for modules that could have precompile directives added manually. The contents of the last of these should be recognizable:

function _precompile_()
+Using @snoop_inference to emit manual precompile directives · SnoopCompile

Using @snoop_inference to emit manual precompile directives

In a few cases, it may be inconvenient or impossible to precompile using a workload. Some examples might be:

  • an application that opens graphical windows
  • an application that connects to a database
  • an application that creates, deletes, or rewrites files on disk

In such cases, one alternative is to create a manual list of precompile directives using Julia's precompile(f, argtypes) function.

Warning

Manual precompile directives are much more likely to "go stale" as the package is developed–-precompile does not throw an error if a method for the given argtypes cannot be found. They are also more likely to be dependent on the Julia version, operating system, or CPU architecture. Whenever possible, it's safer to use a workload.

precompile directives have to be emitted by the module that owns the method and/or types. SnoopCompile comes with a tool, parcel, that splits out the "root-most" precompilable MethodInstances into their constituent modules. This will typically correspond to the bottom row of boxes in the flame graph. In cases where you have some that are not naively precompilable, they will include MethodInstances from higher up in the call tree.

Let's use SnoopCompile.parcel on our OptimizeMe demo:

julia> using SnoopCompileCore, SnoopCompile # here we need the SnoopCompile path for the next line (normally you should wait until after data collection is complete)
julia> include(joinpath(pkgdir(SnoopCompile), "examples", "OptimizeMe.jl"))Main.var"Main".OptimizeMe
julia> tinf = @snoop_inference OptimizeMe.main();lotsa containers:
julia> ttot, pcs = SnoopCompile.parcel(tinf);
julia> ttot0.06976970299999999
julia> pcs4-element Vector{Pair{Module, Tuple{Float64, Vector{Tuple{Float64, Core.MethodInstance}}}}}: + Core => (1.914e-6, [(1.914e-6, MethodInstance for (NamedTuple{(:sizehint,)})(::Tuple{Int64}))]) + Base.Multimedia => (7.053e-6, [(7.053e-6, MethodInstance for MIME(::String))]) + Base => (0.0034157989999999993, [(1.523e-6, MethodInstance for IOContext(::IOBuffer, ::IOContext{Base.PipeEndpoint})), (1.633e-6, MethodInstance for LinearIndices(::Tuple{Base.OneTo{Int64}})), (3.937e-6, MethodInstance for IOContext(::IOContext{Base.PipeEndpoint}, ::Base.ImmutableDict{Symbol, Any})), (6.182e-6, MethodInstance for Base.indexed_iterate(::Tuple{String, Bool}, ::Int64, ::Int64)), (6.402e-6, MethodInstance for getindex(::Tuple{Int64, Int64}, ::Int64)), (6.593e-6, MethodInstance for Base.indexed_iterate(::Tuple{Int64, Int64}, ::Int64, ::Int64)), (6.652e-6, MethodInstance for Base.indexed_iterate(::Pair{Symbol, Any}, ::Int64, ::Int64)), (7.083e-6, MethodInstance for Base.indexed_iterate(::Tuple{Any, Int64}, ::Int64, ::Int64)), (8.025e-6, MethodInstance for getindex(::Tuple{Base.OneTo{Int64}}, ::Int64)), (1.0019e-5, MethodInstance for Base.indexed_iterate(::Pair{Symbol, Any}, ::Int64)) … (2.9906e-5, MethodInstance for getproperty(::UnionAll, ::Symbol)), (3.1329e-5, MethodInstance for getproperty(::Vector, ::Symbol)), (3.2872e-5, MethodInstance for getproperty(::BitVector, ::Symbol)), (0.000113894, MethodInstance for LinearIndices(::Vector{Float64})), (0.000261509, MethodInstance for print(::IOContext{Base.PipeEndpoint}, ::Char)), (0.000264282, MethodInstance for haskey(::IOContext{Base.PipeEndpoint}, ::Symbol)), (0.000286777, MethodInstance for print(::IOContext{Base.PipeEndpoint}, ::String)), (0.00034920199999999995, MethodInstance for get(::IOContext{Base.PipeEndpoint}, ::Symbol, ::Type{Any})), (0.00045692500000000006, MethodInstance for get(::IOContext{Base.PipeEndpoint}, ::Symbol, ::Bool)), (0.0013621279999999998, MethodInstance for string(::String, ::Int64, ::String))]) + Main.var"Main".OptimizeMe => (0.024680413999999998, [(0.000109535, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)), (0.024570878999999997, MethodInstance for Main.var"Main".OptimizeMe.main())])

ttot shows the total amount of time spent on type-inference. parcel discovered precompilable MethodInstances for four modules, Core, Base.Multimedia, Base, and OptimizeMe that might benefit from precompile directives. These are listed in increasing order of inference time.

Let's look specifically at OptimizeMeFixed, since that's under our control:

julia> pcmod = pcs[end]Main.var"Main".OptimizeMe => (0.024680413999999998, Tuple{Float64, Core.MethodInstance}[(0.000109535, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)), (0.024570878999999997, MethodInstance for Main.var"Main".OptimizeMe.main())])
julia> tmod, tpcs = pcmod.second;
julia> tmod0.024680413999999998
julia> tpcs2-element Vector{Tuple{Float64, Core.MethodInstance}}: + (0.000109535, MethodInstance for Main.var"Main".OptimizeMe.howbig(::Float64)) + (0.024570878999999997, MethodInstance for Main.var"Main".OptimizeMe.main())

This indicates the amount of time spent specifically on OptimizeMe, plus the list of calls that could be precompiled in that module.

We could look at the other modules (packages) similarly.

SnoopCompile.write

You can generate files that contain ready-to-use precompile directives using SnoopCompile.write:

julia> SnoopCompile.write("/tmp/precompiles_OptimizeMe", pcs)Core: no precompile statements out of 1.914e-6
+Base.Multimedia: no precompile statements out of 7.053e-6
+Base: precompiled 0.0013621279999999998 out of 0.0034157989999999993
+Main.var"Main".OptimizeMe: precompiled 0.024570878999999997 out of 0.024680413999999998

You'll now find a directory /tmp/precompiles_OptimizeMe, and inside you'll find files for modules that could have precompile directives added manually. The contents of the last of these should be recognizable:

function _precompile_()
     ccall(:jl_generating_output, Cint, ()) == 1 || return nothing
     Base.precompile(Tuple{typeof(main)})   # time: 0.4204474
-end

The first ccall line ensures we only pay the cost of running these precompile directives if we're building the package; this is relevant mostly if you're running Julia with --compiled-modules=no, which can be a convenient way to disable precompilation and examine packages in their "native state." (It would also matter if you've set __precompile__(false) at the top of your module, but if so why are you reading this?)

This file is ready to be moved into the OptimizeMe repository and included into your module definition.

You might also consider submitting some of the other files (or their precompile directives) to the packages you depend on.

+end

The first ccall line ensures we only pay the cost of running these precompile directives if we're building the package; this is relevant mostly if you're running Julia with --compiled-modules=no, which can be a convenient way to disable precompilation and examine packages in their "native state." (It would also matter if you've set __precompile__(false) at the top of your module, but if so why are you reading this?)

This file is ready to be moved into the OptimizeMe repository and included into your module definition.

You might also consider submitting some of the other files (or their precompile directives) to the packages you depend on.

diff --git a/dev/tutorials/snoop_llvm/index.html b/dev/tutorials/snoop_llvm/index.html index b740f12f..f6368baa 100644 --- a/dev/tutorials/snoop_llvm/index.html +++ b/dev/tutorials/snoop_llvm/index.html @@ -16,4 +16,4 @@ @ Base reflection.jl:1299 ...

This will write two files, "func_names.csv" and "llvm_timings.yaml", in your current working directory. Let's look at what was read from these files:

julia> timesERROR: UndefVarError: `times` not defined in `Main.var"Main"`
 Suggestion: check for spelling errors or missing imports.
julia> infoERROR: UndefVarError: `info` not defined in `Main.var"Main"` -Suggestion: check for spelling errors or missing imports.
+Suggestion: check for spelling errors or missing imports.