JOSS review: sinusoid examples and plots in paper (#298)

* Paper and docs Fig. 1 comparing true and observed signals: changed 6.99 -> 7.0 in legend; edited blue vertical line with arrows for observation range so it is now centered on the observed mean following reviewer comments. * Paper and docs for contour plots showing how sinusoid outputs vary with respect to parameters, updated color limits to be fixed for all contour plots for easier comparisons * Docs for the sampling using Random Features: add a table with mean and std of parameter values to directly compare values to GP results. * Paper, updated sentence comparing ABC and CES to avoid making too strong a claim. Instead, just comment on some limiations of ABC and CES. Also include a reference to the the high dimensional chapter of the ABC Handbook.
CliMA · Apr 4, 2024 · 2827e71 · 2827e71
1 parent fb4fa50
commit 2827e71
Show file tree

Hide file tree

Showing 15 changed files with 55 additions and 9 deletions.
diff --git a/docs/src/assets/sinusoid_GP_emulator_contours.png b/docs/src/assets/sinusoid_GP_emulator_contours.png
diff --git a/docs/src/assets/sinusoid_GP_errors_contours.png b/docs/src/assets/sinusoid_GP_errors_contours.png
diff --git a/docs/src/assets/sinusoid_RF_emulator_contours.png b/docs/src/assets/sinusoid_RF_emulator_contours.png
diff --git a/docs/src/assets/sinusoid_RF_errors_contours.png b/docs/src/assets/sinusoid_RF_errors_contours.png
diff --git a/docs/src/assets/sinusoid_groundtruth_contours.png b/docs/src/assets/sinusoid_groundtruth_contours.png
diff --git a/docs/src/assets/sinusoid_true_vs_observed_signal.png b/docs/src/assets/sinusoid_true_vs_observed_signal.png
diff --git a/docs/src/examples/sinusoid_example.md b/docs/src/examples/sinusoid_example.md
@@ -402,10 +402,10 @@ println("Vertical shift mean: ", mean(constrained_posterior["vert_shift"]), ", s
 This gives:
 
 
-| parameters         | mean    | std |
-| :---------------- | :------: | :----: |
-| amplitude         |   3.0382 | 0.2861  |
-| vert_shift        |   6.3897 | 0.4601  |
+| parameters        | mean     | std     |
+| :---------------- | :------: | :----:  |
+| amplitude         |   3.0382 | 0.2880  |
+| vert_shift        |   6.3774 | 0.4586  |
 
 This is in agreement with the true $\theta=(3.0, 7.0)$ and with the observational covariance matrix we provided $\Gamma=0.2 * I$ (i.e., a standard deviation of approx. $0.45$). `CalibrateEmulateSample.jl` has built-in plotting
 recipes to help us visualize the prior and posterior distributions.  Note that these are the
@@ -433,6 +433,11 @@ $\theta_2$ below, with the marginal distributions on each axis.
 We can repeat the sampling method using the random features emulator instead of the Gaussian
 process and we find similar results: 
 
+| parameters        | mean     | std    |
+| :---------------- | :------: | :----: |
+| amplitude         |   3.3210 | 0.7216 |
+| vert_shift        |   6.3986 | 0.5098 |
+
 ![RF_2d_posterior](../assets/sinusoid_MCMC_hist_RF.png)
 
 It is reassuring to see that our uncertainty quantification methods are robust to the different emulator

diff --git a/examples/Sinusoid/emulate.jl b/examples/Sinusoid/emulate.jl
@@ -156,11 +156,15 @@ rf_std_grid = reshape(permutedims(reduce(vcat, [x' for x in rf_std]), (2, 1)), (
 ## Plot
 # First, we will plot the ground truth. We have 2 parameters and 2 outputs, so we will create a contour plot 
 # for each output to show how they vary against the two inputs. 
+range_clims = (0, 8)
+mean_clims = (-6, 10)
+
 p1 = contour(
     amp_range,
     vshift_range,
     g_true_grid[1, :, :];
     fill = true,
+    clims = range_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "True Sinusoid Range",
@@ -170,6 +174,7 @@ p2 = contour(
     vshift_range,
     g_true_grid[2, :, :];
     fill = true,
+    clims = mean_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "True Sinusoid Mean",
@@ -188,6 +193,7 @@ p1 = contour(
     vshift_range,
     gp_grid[1, :, :];
     fill = true,
+    clims = range_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "GP Sinusoid Range",
@@ -200,6 +206,7 @@ p2 = contour(
     vshift_range,
     gp_grid[2, :, :];
     fill = true,
+    clims = mean_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "GP Sinusoid Mean",
@@ -214,6 +221,7 @@ p1 = contour(
     vshift_range,
     rf_grid[1, :, :];
     fill = true,
+    clims = range_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "RF Sinusoid Range",
@@ -224,6 +232,7 @@ p2 = contour(
     vshift_range,
     rf_grid[2, :, :];
     fill = true,
+    clims = mean_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "RF Sinusoid Mean",
@@ -238,12 +247,16 @@ savefig(p, joinpath(data_save_directory, "sinusoid_RF_emulator_contours.png"))
 
 # Next, we plot uncertainty estimates from the GP and RF emulators
 # Plot GP std estimates
+range_std_clims = (0, 2)
+mean_std_clims = (0, 1)
+
 p1 = contour(
     amp_range,
     vshift_range,
     gp_std_grid[1, :, :];
     c = :cividis,
     fill = true,
+    clims = range_std_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "GP 1σ in Sinusoid Range",
@@ -254,6 +267,7 @@ p2 = contour(
     gp_std_grid[2, :, :];
     c = :cividis,
     fill = true,
+    clims = mean_std_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "GP 1σ in Sinusoid Mean",
@@ -269,6 +283,7 @@ p1 = contour(
     rf_std_grid[1, :, :];
     c = :cividis,
     fill = true,
+    clims = range_std_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "RF 1σ in Sinusoid Range",
@@ -279,6 +294,7 @@ p2 = contour(
     rf_std_grid[2, :, :];
     c = :cividis,
     fill = true,
+    clims = mean_std_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "RF 1σ in Sinusoid Mean",
@@ -291,12 +307,15 @@ savefig(p, joinpath(data_save_directory, "sinusoid_RF_emulator_std_contours.png"
 # Finally, we should validate how accurate the emulators are by looking at the absolute difference between emulator
 # predictions and the ground truth. 
 gp_diff_grid = abs.(gp_grid - g_true_grid)
+range_diff_clims = (0, 1)
+mean_diff_clims = (0, 1)
 p1 = contour(
     amp_range,
     vshift_range,
     gp_diff_grid[1, :, :];
     c = :cividis,
     fill = true,
+    clims = range_diff_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "GP error in Sinusoid Range",
@@ -307,6 +326,7 @@ p2 = contour(
     gp_diff_grid[2, :, :];
     c = :cividis,
     fill = true,
+    clims = mean_diff_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "GP error in Sinusoid Mean",
@@ -321,6 +341,7 @@ p1 = contour(
     rf_diff_grid[1, :, :];
     c = :cividis,
     fill = true,
+    clims = range_diff_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "RF error in Sinusoid Range",
@@ -331,6 +352,7 @@ p2 = contour(
     rf_diff_grid[2, :, :];
     c = :cividis,
     fill = true,
+    clims = mean_diff_clims,
     xlabel = "Amplitude",
     ylabel = "Vertical Shift",
     title = "RF error in Sinusoid Mean",

diff --git a/examples/Sinusoid/sinusoid_setup.jl b/examples/Sinusoid/sinusoid_setup.jl
@@ -88,15 +88,15 @@ function generate_obs(amplitude_true, vert_shift_true, Γ; rng = Random.GLOBAL_R
         color = :red,
         style = :dash,
         linewidth = 2,
-        label = "True mean: " * string(round(y2_true, digits = 2)),
+        label = "True mean: " * string(round(y2_true, digits = 1)),
     )
     plot!(
         [argmax(signal_true), argmax(signal_true)],
         [minimum(signal_true), maximum(signal_true)],
         arrows = :both,
         color = :red,
         linewidth = 2,
-        label = "True range: " * string(round(y1_true, digits = 2)),
+        label = "True range: " * string(round(y1_true, digits = 1)),
     )
 
     # However, our observations are typically not noise-free, so we add some white noise to our 
@@ -115,8 +115,8 @@ function generate_obs(amplitude_true, vert_shift_true, Γ; rng = Random.GLOBAL_R
         label = "Observed mean: " * string(round(y2_obs, digits = 2)),
     )
     plot!(
-        [argmax(signal_true) + 10, argmax(signal_true) + 10],
-        [y2_true - y1_obs / 2, y2_true + y1_obs / 2],
+        [argmax(signal_true) + 15, argmax(signal_true) + 15],
+        [y2_obs - y1_obs / 2, y2_obs + y1_obs / 2],
         arrows = :both,
         color = :blue,
         linewidth = 1,

diff --git a/paper.bib b/paper.bib
@@ -12,6 +12,25 @@ @article{julia
   journal   = {{SIAM} Review}
 }
 
+@book{Sisson:2018,
+title = {Handbook of {Approximate} {Bayesian} {Computation}},
+isbn = {978-1-351-64346-7},
+publisher = {CRC Press},
+author = {Sisson, Scott A. and Fan, Yanan and Beaumont, Mark},
+year = {2018}
+}
+
+@incollection{Nott:2018,
+  author  = {Nott, David J. and Ong, Victor M.-H. and Fan, Y. and Sisson, S. A.},
+  title = {High-{Dimensional} {ABC}},
+  booktitle   = {Handbook of {Approximate} {Bayesian} {Computation}},
+  isbn = {978-1-315-11719-5},
+  chapter = {8},
+  pages   = {211-241},
+  year    = {2018},
+  publisher = {CRC Press},
+}
+
 @article{Cleary:2021,
 title = {Calibrate, emulate, sample},
 journal = {Journal of Computational Physics},

diff --git a/paper.md b/paper.md
@@ -67,7 +67,7 @@ Computationally expensive computer codes for predictive modelling are ubiquitous
 
 In Julia there are a few tools for performing non-accelerated uncertainty quantification, from classical sensitivity analysis approaches, e.g., [UncertaintyQuantification.jl](https://zenodo.org/records/10149017), GlobalSensitivity.jl [@Dixit:2022], and MCMC, e.g., [Mamba.jl](https://github.com/brian-j-smith/Mamba.jl) or [Turing.jl](https://turinglang.org/). For computational efficiency, ensemble methods also provide approximate sampling (e.g., the Ensemble Kalman Sampler [@Garbuno-Inigo:2020b;@Dunbar:2022a]) though these only provide Gaussian approximations of the posterior.
 
-Accelerated uncertainty quantification tools also exist for the related approach of Approximate Bayesian Computation (ABC), e.g., GpABC [@Tankhilevich:2020] or [ApproxBayes.jl](https://github.com/marcjwilliams1/ApproxBayes.jl?tab=readme-ov-file); these tools both approximately sample from the posterior distribution. In ABC, this approximation comes from bypassing the likelihood that is usually required in sampling methods, such as MCMC. Instead, the goal of ABC is to replace the likelihood with a scalar-valued sampling objective that compares model and data. In CES, the approximation comes from learning the parameter-to-data map, then following this it calculates an explicit likelihood and uses exact sampling via MCMC. Some ABC algorithms also make use of statistical emulators to further accelerate sampling (GpABC). ABC can be used in more general contexts than CES, but suffers greater approximation error and more stringent assumptions, especially in multi-dimensional problems.
+Accelerated uncertainty quantification tools also exist for the related approach of Approximate Bayesian Computation (ABC), e.g., GpABC [@Tankhilevich:2020] or [ApproxBayes.jl](https://github.com/marcjwilliams1/ApproxBayes.jl?tab=readme-ov-file); these tools both approximately sample from the posterior distribution. In ABC, this approximation comes from bypassing the likelihood that is usually required in sampling methods, such as MCMC. Instead, the goal of ABC is to replace the likelihood with a scalar-valued sampling objective that compares model and data. In CES, the approximation comes from learning the parameter-to-data map, then following this it calculates an explicit likelihood and uses exact sampling via MCMC. Some ABC algorithms also make use of statistical emulators to further accelerate sampling (GpABC). Although flexible, ABC encounters challenges due to the subjectivity of summary statistics and distance metrics, that may lead to approximation errors particularly in high-dimensional settings [@Nott:2018]. CES is more restrictive due to use of an explicit Gaussian likelihood, but also leverages this structure to deal with high dimensional data.
 
 Several other tools are available in other languages for a purpose of accelerated learning of the posterior distribution or posterior sampling. Two such examples, written in python, approximate the log-posterior distribution directly with a Gaussian process: [PyVBMC](https://github.com/acerbilab/pyvbmc) [@Huggins:2023] additionaly uses variational approximations to calculate the normalization constant, and [GPry](https://github.com/jonaselgammal/GPry) [@Gammal:2022], which iteratively trains the GP with an active training point selection algorithm. Such algorithms are distinct from CES, which approximates the parameter-to-data map with the Gaussian process, and advocates ensemble Kalman methods to select training points. 
 

diff --git a/sinusoid_GP_emulator_contours.png b/sinusoid_GP_emulator_contours.png
diff --git a/sinusoid_MCMC_hist_GP.png b/sinusoid_MCMC_hist_GP.png
diff --git a/sinusoid_eki_pairs.png b/sinusoid_eki_pairs.png
diff --git a/sinusoid_true_vs_observed_signal.png b/sinusoid_true_vs_observed_signal.png