Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse memory for ELBO calculation #75

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cscherrer
Copy link

I noticed this line:

u = Random.randn!(rng, similar(μ, N, ndraws))

This allocates a new u on each iteration, which will have some overhead. This PR removes that, instead allocating u once in maximize_elbo and reusing it.

Some caveats:

  1. This didn't have nearly the impact I expected
  2. Tests are not yet updated for this

That said, here's the test case:

using Pathfinder

rosenbrock(x) = @inbounds -(1-x[1])^2  - 100 * (x[2] - x[1]^2)^2

using BenchmarkTools
result = pathfinder(rosenbrock; init=zeros(2))

The result is

julia> result = pathfinder(rosenbrock; init=zeros(2))
Single-path Pathfinder result
  tries: 1
  draws: 5
  fit iteration: 7 (total: 22)
  fit ELBO: -2.77 ± 0.78
  fit distribution: Distributions.MvNormal{Float64, Pathfinder.WoodburyPDMat{Float64, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, Matrix{Float64}, Matrix{Float64}, LinearAlgebra.Diagonal{Float64, Vector{Float64}}, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}, LinearAlgebra.UpperTriangular{Float64, Matrix{Float64}}}, Vector{Float64}}(
dim: 2
μ: [0.658221, 0.418772]
Σ: [0.122998 0.13468; 0.13468 0.151634]
)

Note that there are 22 ELBO evaluations.

Now for benchmarking.

BEFORE

julia> @benchmark result = pathfinder(rosenbrock; init=zeros(2))
BenchmarkTools.Trial: 4835 samples with 1 evaluation.
 Range (min  max):  902.530 μs    8.335 ms  ┊ GC (min  max): 0.00%  79.29%
 Time  (median):     966.893 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.030 ms ± 514.814 μs  ┊ GC (mean ± σ):  3.46% ±  6.15%

 Memory estimate: 293.51 KiB, allocs estimate: 3359.

AFTER

julia> @benchmark result = pathfinder(rosenbrock; init=zeros(2))
BenchmarkTools.Trial: 4957 samples with 1 evaluation.
 Range (min  max):  898.102 μs   12.059 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     956.924 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.005 ms ± 503.189 μs  ┊ GC (mean ± σ):  3.37% ± 6.15%

 Memory estimate: 290.55 KiB, allocs estimate: 3338.

Now 3359 - 3338 = 21, because we've allocated once instead of 22 times.

It's very minor in this case, but maybe much larger problems could require many more ELBO evaluations. Then again, in those cases the non-allocation overhead will also be much higher.

@codecov
Copy link

codecov bot commented Jun 9, 2022

Codecov Report

Merging #75 (e00e7ce) into main (3516702) will decrease coverage by 14.46%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##             main      #75       +/-   ##
===========================================
- Coverage   92.89%   78.43%   -14.47%     
===========================================
  Files          13       13               
  Lines         521      524        +3     
===========================================
- Hits          484      411       -73     
- Misses         37      113       +76     
Impacted Files Coverage Δ
src/elbo.jl 80.00% <100.00%> (+2.72%) ⬆️
src/mvnormal.jl 100.00% <100.00%> (ø)
src/optimize.jl 54.23% <0.00%> (-40.68%) ⬇️
src/woodbury.jl 65.34% <0.00%> (-34.66%) ⬇️
src/transducers.jl 81.25% <0.00%> (-18.75%) ⬇️
src/resample.jl 83.33% <0.00%> (-16.67%) ⬇️
src/multipath.jl 53.70% <0.00%> (-12.97%) ⬇️
src/singlepath.jl 81.57% <0.00%> (-7.90%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3516702...e00e7ce. Read the comment docs.

@sethaxen
Copy link
Member

sethaxen commented Jun 9, 2022

Thanks @cscherrer for the analysis and PR! I had considered reusing u before but ultimately decided not to in the interest of allowing the user to inspect all intermediates (they can currently access these draws if they desire). I'm not certain how useful that is in the end though.

The main reason I considered reusing u was due to memory. For very large models, say those with 10^6+ parameters, if Pathfinder runs 10^3 iterations, then we've already stored O(10^10) parameters. Drawing 5 draws from each distribution gives us 10^10 new parameters. So we're looking at worst at a doubling of the memory requirements by not reusing u. And it's not clear to me how big of a problem that is.

EE = Core.Compiler.return_type(
elbo_and_samples, Tuple{typeof(rng),typeof(logp),eltype(dists),Int}
elbo_and_samples!, Tuple{typeof(rng),typeof(u),typeof(logp),eltype(dists),Int}
)
estimates = similar(dists, EE)
isempty(estimates) && return 0, estimates
Folds.map!(estimates, dists, executor) do dist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since u is being overwritten here, I think we cannot use Transducers, or different threads could be writing to it at the same time. This will probably just need to be made a simple map

end
_, iteration_opt = _findmax(estimates |> Transducers.Map(est -> est.value))
return iteration_opt, estimates
end

function elbo_and_samples(rng, logp, dist, ndraws)
ϕ, logqϕ = rand_and_logpdf(rng, dist, ndraws)
function elbo_and_samples!(rng, u, logp, dist, ndraws)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

draws will need to be removed from ELBOEstimate as well.

@cscherrer
Copy link
Author

Thanks @sethaxen , memory overhead is a great point. One possibility for getting to the best of both worlds is to have a callback that can optionally copy the draws at each step. You could even copy conditionally, like thinning in MCMC or based on some predicate.

There's a risk here of over-engineering, solving problems that don't exist, or at least don't exist yet. But then it's also useful to get to an API that can scale.

@sethaxen
Copy link
Member

Keeping in mind #15, the goal is ultimately to allow the user to provide an object that configures an optimization procedure over the trace of distributions, with the default being ELBOMaximization or something like that. Such an object could have a type parameter that configures whether the samples drawn are kept or overwritten. So I'll take a stab at designing that interface. Then it should be pretty easy to finish up this PR based on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants