Allow algorithms to use draws of model fits for inits #936

SteveBronder · 2024-03-19T22:10:27Z

Submission Checklist

Run unit tests
Declare copyright holder and agree to license (see below)

Summary

This PR does two things

It allows all algorithms to use inits that are draws object from posterior or fit objects from cmdstanr
Instead of dispatching one process per chain for sampling it instead uses cmdstan's internal parallel chains. This is nice because it means that the data is only copied over for cmdstan once. So if a user wants to run 100 chains with 10GB of data they would have needed 1000GB of memory for the data where now they need only 10GB for the data. Reducing copies of the data also cuts down on memory bloat that can thrash the cpu caches.

Removing the need for multiple procs was done to support multiple inits for pathfinder. Previously, pathfinder was the only algorithm that used cmdstan's internal parallelism for multiple paths. While I was coding up support for pathfinder to accept a list of inits I ran into some odd problems in the lower level parts of cmdstanr that would require a bunch of if statements just for pathfinder during execution and creation/reading/writing files. imo I thought it was best to push sampling to use multiple chains as that helped keep the code a little cleaner.

Using Stan's internal parallel chains has one major downside which is that cmdstan is all or nothing for chain initialization failure. If one chain fails to initialize then the whole program fails. I have an issue (stan-dev/stan#3275) to address this, but I think personally the benefits outweigh the costs.

One other issue is that I couldn't completely remove proc since cmdstan does not use the generate_quantities that uses Stan's internal parallelism. I have a PR (stan-dev/cmdstan#1256) to fix this so the next release of cmdstan we can use that and only need to ever spin up 1 process for many chain algorithms.

Copyright and Licensing

Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Simons Foundation

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

…inits

SteveBronder · 2024-03-19T22:10:56Z

Pinging @avehtari if you have time to take this for a spin!

…inits

SteveBronder · 2024-03-20T14:27:29Z

After some comments from @bob-carpenter and @WardBrian in stan-dev/stan#3275 I think the default behavior should be that if any chains fail to init then all chains fail to init. This lines up with cmdstan and cmdstanpy's behavior and is usually caused by something being wrong with the model

avehtari · 2024-03-20T16:00:44Z

It seems to work. I don't see documentation on how does the init work if given a Pathfinder object with many draws. Can you clarify?

SteveBronder · 2024-03-20T18:45:10Z

@avehtari it uses the same logic you had in your original code. For anything that does an approximation we cut the draws down to uniques if less than 95 percent of the samples are unique.

  if (unique_draws < (0.95 * nrow(draws_df))) {
    temp_df = aggregate(.draw ~ lw, data = draws_df, FUN = min)
    draws_df = posterior::as_draws_df(merge(temp_df, draws_df, by = 'lw'))
    draws_df$pareto_weight = exp(draws_df$lw - max(draws_df$lw))
  } else {
    draws_df$pareto_weight = posterior::pareto_smooth(
      exp(draws_df$lw - max(draws_df$lw)), tail = "right")[["x"]]
  }

SteveBronder · 2024-03-20T18:46:48Z

@jgabry I need to minute to fix a few little things. I forgot about the performance but mingw has when using thread_local variables that makes stan very slow when using threading. I'm changing things so that if the user has wsl, mac, or linux then threading is enabled by default and we do threading in cmdstan, but if the user uses mingw then we just do several processes still

avehtari · 2024-03-20T19:25:43Z

@avehtari it uses the same logic you had in your original code. For anything that does an approximation we cut the draws down to uniques if less than 95 percent of the samples are unique.

And if, e.g., 4 inits are needed, samples 4 inits from the weights using the weights and without replacement?

jgabry · 2024-03-20T19:49:28Z

@jgabry I need to minute to fix a few little things.

No problem. I can wait to take a look (I have a bunch of other things to work on anyway this afternoon). Thanks again for working on this!

SteveBronder · 2024-03-21T03:13:01Z

Yep!

avehtari · 2024-03-21T07:39:00Z

The output from sampling is a bit confusing

Chain 1 
Chain 1 Chain [1] Iteration: 100 / 200 [ 50%]  (Warmup) 
Chain 1 Chain [1] Iteration: 101 / 200 [ 50%]  (Sampling) 
Chain 1 Chain [4] Iteration: 100 / 200 [ 50%]  (Warmup) 
Chain 1 Chain [4] Iteration: 101 / 200 [ 50%]  (Sampling) 
Chain 1 Chain [2] Iteration: 100 / 200 [ 50%]  (Warmup) 
Chain 1 Chain [2] Iteration: 101 / 200 [ 50%]  (Sampling) 
Chain 1 Chain [3] Iteration: 100 / 200 [ 50%]  (Warmup) 
Chain 1 Chain [3] Iteration: 101 / 200 [ 50%]  (Sampling) 
Chain 1 Chain [1] Iteration: 200 / 200 [100%]  (Sampling) 
Chain 1 finished in 4.6 seconds.
MCMC sampling from model 1 posterior took 6.1 sec

All lines start with "Chain 1", and then have "Chain [number]"

avehtari · 2024-03-21T08:15:02Z

If I try to initialize with a wrong fit object missing a parameter

fit1 <- model1$sample(data=standata1, iter_warmup=100, iter_sampling=100,
                      chains=4, threads=4,
                      init=fit2)

the error message is

Error: The following variables are missing in the draws object: {'intercept'}

It would be better if it would say missing in the fit object, but if that information is not available at the point of error, then this is acceptable.

avehtari · 2024-03-21T09:11:57Z

> pth5 <- model5$pathfinder(data = standata5, init=0.1,
                          num_paths=10, single_path_draws=40, draws=400,
                          history_size=50, max_lbfgs_iters=100, refresh=0, seed=1)
Pareto k value (19.100021) is greater than 0.7. Importance resampling was not able to improve the approximation, which may indicate that the approximation itself is poor. 
Finished in  11.6 seconds.

> fit5 <- model5$sample(data=standata5, iter_warmup=100, iter_sampling=100,
                      chains=4, threads=4,
                      init=pth5)
Error in process_init_approx(init, num_procs, model_variables, warn_partial) : 
  Not enough distinct draws (4) to create inits.

Here the message Error in process_init_approx could be dropped.
"Not enough distinct draws (4) to create inits." could say
"Not enough distinct draws (4) in the Pathfinder object to create inits.
Try running Pathfinder with psis_resample=FALSE"

> pth5b <- model5$pathfinder(data = standata5, init=0.1,
                          num_paths=10, single_path_draws=40, draws=400,
                          history_size=50, max_lbfgs_iters=100, refresh=0,
                          seed=1, psis_resample=FALSE)
Finished in  11.7 seconds.
> fit5 <- model5$sample(data=standata5, iter_warmup=100, iter_sampling=100,
                      chains=4, threads=4,
                      init=pth5b)

Pareto k-hat = 19.1. Mean does not exist, making empirical mean estimate of the draws not applicable.
Error in posterior::pareto_smooth(exp(draws_df$lw - max(draws_df$lw)),  : 
  subscript out of bounds
In addition: Warning message:
In read_csv_metadata(output_file) : NAs introduced by coercion

Something wrong here, as my R code create_inits() works for this

SteveBronder · 2024-03-21T19:30:13Z

but if that information is not available at the point of error, then this is acceptable.

That warning is made at a much lower level. I'll see if I can move it up

All lines start with "Chain 1", and then have "Chain [number]"

Yes. tbh I think I'm going to close this PR and open up a simpler one that just does the inits. I thought doing multi path pathfinder inits would be really messy without also enabling parallelism from within Stan, but now that I understand the codebase more I think it's actually pretty simple. To get parallelism from within cmdstan it's going to require a lot of changes and I think needs to be a separate PR

"Not enough distinct draws (4) in the Pathfinder object to create inits. Try running Pathfinder with psis_resample=FALSE"

Fixed

Something wrong here, as my R code create_inits() works for this

Can you send me your standata5? Just to make sure I can reproduce the error

SteveBronder and others added 8 commits March 4, 2024 17:41

start function to allow init from pathfinder object

8a1f45a

update

f532617

Merge remote-tracking branch 'origin/master' into feature/pathfinder-…

3503d5e

…inits

update

525c84b

fixing gq tests

bafc36e

allow init to take draws

6e9ead6

Fix up a few of the TODOs before merge

2cdd82c

turns of partial failed chains

b1f9fdc

SteveBronder added 2 commits March 20, 2024 10:08

Merge remote-tracking branch 'origin/master' into feature/pathfinder-…

a078b18

…inits

update docs and remove checks for possibly failed output files

699480b

SteveBronder requested a review from jgabry March 20, 2024 14:27

SteveBronder removed the request for review from jgabry March 20, 2024 18:45

fixing parallel stuff for windows

f31113a

SteveBronder closed this Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow algorithms to use draws of model fits for inits #936

Allow algorithms to use draws of model fits for inits #936

SteveBronder commented Mar 19, 2024

SteveBronder commented Mar 19, 2024

SteveBronder commented Mar 20, 2024

avehtari commented Mar 20, 2024

SteveBronder commented Mar 20, 2024

SteveBronder commented Mar 20, 2024

avehtari commented Mar 20, 2024

jgabry commented Mar 20, 2024

SteveBronder commented Mar 21, 2024

avehtari commented Mar 21, 2024

avehtari commented Mar 21, 2024

avehtari commented Mar 21, 2024

SteveBronder commented Mar 21, 2024

Allow algorithms to use draws of model fits for inits #936

Allow algorithms to use draws of model fits for inits #936

Conversation

SteveBronder commented Mar 19, 2024

Submission Checklist

Summary

Copyright and Licensing

SteveBronder commented Mar 19, 2024

SteveBronder commented Mar 20, 2024

avehtari commented Mar 20, 2024

SteveBronder commented Mar 20, 2024

SteveBronder commented Mar 20, 2024

avehtari commented Mar 20, 2024

jgabry commented Mar 20, 2024

SteveBronder commented Mar 21, 2024

avehtari commented Mar 21, 2024

avehtari commented Mar 21, 2024

avehtari commented Mar 21, 2024

SteveBronder commented Mar 21, 2024