Skip to content

Commit

Permalink
remove old observations framework (#312)
Browse files Browse the repository at this point in the history
removed unused functionality based on old Observations

typo docs

example typo

format

remove deprecated kw

rm deprecated kw

qualify get_rng

better qualifying get_rng

properly override get_rng

format

try

try to resolve buildkite

add scope

scoping fixed in examples

use shrinkage from EKP

use getter for obs noise cov

getter

shrinkage back

working cloudy example

rm explicit EKP
  • Loading branch information
odunbar authored Aug 1, 2024
1 parent f2e95cf commit b8338a0
Show file tree
Hide file tree
Showing 16 changed files with 243 additions and 250 deletions.
363 changes: 199 additions & 164 deletions docs/Manifest.toml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/src/API/RandomFeatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ get_batch_sizes
get_n_features
get_input_dim
get_output_dim
get_rng
EKP.get_rng
get_kernel_structure
get_feature_decomposition
get_optimizer_options
Expand Down
17 changes: 12 additions & 5 deletions docs/src/examples/Cloudy_example.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
!!! info "How do I run this code?"
The full code is found in the [`examples/`](https://github.com/CliMA/CalibrateEmulateSample.jl/tree/main/examples) directory of the github repository

!!! warn "version control for Cloudy"
Due to rapid developments in Cloudy, this example will not work with the latest version. It is known to work pinned to specific commit `b4fa7e3`, please add Cloudy to the example Project using command `add Cloudy#b4fa7e3` in `Pkg` to avoid errors.


This example is based on [Cloudy](https://github.com/CliMA/Cloudy.jl.git), a
microphysics model that simulates how cloud droplets collide and coalesce into
larger drops. Collision-coalescence is a crucial process for the formation of
Expand Down Expand Up @@ -76,7 +80,6 @@ and finally the EKP packages.

```julia
using EnsembleKalmanProcesses
using EnsembleKalmanProcesses.Observations
using EnsembleKalmanProcesses.ParameterDistributions
using EnsembleKalmanProcesses.DataContainers
using EnsembleKalmanProcesses.PlotRecipes
Expand Down Expand Up @@ -163,8 +166,13 @@ for i in 1:n_samples
y_t[:, i] = G_t .+ rand(MvNormal(μ, Γy))
end

truth = Observations.Observation(y_t, Γy, data_names)
truth_sample = truth.mean
truth = Observation(
Dict(
"samples" => vec(mean(y_t, dims = 2)),
"covariances" => Γy,
"names" => data_names,
)
)
```

#### Perform ensemble Kalman inversion
Expand All @@ -181,8 +189,7 @@ N_iter = 8 # number of EKI iterations
initial_params = construct_initial_ensemble(rng, priors, N_ens)
ekiobj = EnsembleKalmanProcess(
initial_params,
truth_sample,
truth.obs_noise_cov,
truth,
Inversion(),
scheduler=DataMisfitController()
)
Expand Down
1 change: 0 additions & 1 deletion docs/src/examples/lorenz_example.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ using CalibrateEmulateSample.Utilities
using CalibrateEmulateSample.EnsembleKalmanProcesses
using CalibrateEmulateSample.ParameterDistributions
using CalibrateEmulateSample.DataContainers
using CalibrateEmulateSample.Observations
```

The first input settings define which input-output pairs to use for training the emulator. The Calibrate stage (run using `calibrate.jl`) generates parameter-to-data pairs by running the L96 system using an iterative optimization approach (`EnsembleKalmanProcess.jl`). So we first define which iterations we would like to use data from for our emulator training
Expand Down
31 changes: 13 additions & 18 deletions examples/Cloudy/Cloudy_calibrate.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Reference the in-tree version of CalibrateEmulateSample on Julias load path
include(joinpath(@__DIR__, "../", "ci", "linkfig.jl"))

@info "This experiment is very sensitive to the Cloudy version. It is known to work with Cloudy commit: b4fa7e3"

# Import modules
using Distributions
using StatsBase
Expand All @@ -18,12 +20,12 @@ using Cloudy.KernelTensors
# Import the module that runs Cloudy
include(joinpath(@__DIR__, "DynamicalModel.jl"))

# Import Ensemble Kalman Processes modules
using EnsembleKalmanProcesses
using EnsembleKalmanProcesses.Observations
using EnsembleKalmanProcesses.ParameterDistributions
using EnsembleKalmanProcesses.DataContainers
using EnsembleKalmanProcesses.PlotRecipes
# Import Ensemble Kalman Processes modules via CES
using CalibrateEmulateSample
using CalibrateEmulateSample.EnsembleKalmanProcesses
using CalibrateEmulateSample.EnsembleKalmanProcesses.ParameterDistributions
using CalibrateEmulateSample.EnsembleKalmanProcesses.DataContainers
using CalibrateEmulateSample.EnsembleKalmanProcesses.PlotRecipes


################################################################################
Expand Down Expand Up @@ -100,7 +102,7 @@ savefig(p, output_directory * "cloudy_priors.png")
### Define the data from which we want to learn the parameters
###

data_names = ["M0", "M1", "M2"]
data_names = ["M0_M1_M2"]
moments = [0.0, 1.0, 2.0]
n_moments = length(moments)

Expand Down Expand Up @@ -139,8 +141,7 @@ for i in 1:n_samples
y_t[:, i] = G_t .+ rand(MvNormal(μ, Γy))
end

truth = Observations.Observation(y_t, Γy, data_names)
truth_sample = truth.mean
truth = Observation(Dict("samples" => vec(mean(y_t, dims = 2)), "covariances" => Γy, "names" => data_names))


###
Expand All @@ -151,13 +152,7 @@ N_ens = 50 # number of ensemble members
N_iter = 15 # number of EKI iterations
# initial parameters: n_params x N_ens
initial_params = construct_initial_ensemble(rng, priors, N_ens)
ekiobj = EnsembleKalmanProcess(
initial_params,
truth_sample,
truth.obs_noise_cov,
Inversion(),
scheduler = DataMisfitController(),
)
ekiobj = EnsembleKalmanProcess(initial_params, truth, Inversion(), scheduler = DataMisfitController())

# Initialize a ParticleDistribution with dummy parameters. The parameters
# will then be set within `run_dyn_model`
Expand Down Expand Up @@ -196,9 +191,9 @@ save(
"eki",
ekiobj,
"truth_sample",
truth_sample,
get_obs(truth),
"truth_sample_mean",
truth.mean,
vec(mean(y_t, dims = 2)),
"truth_input_constrained",
ϕ_true,
)
Expand Down
5 changes: 2 additions & 3 deletions examples/Cloudy/Cloudy_emulate_sample.jl
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ function main()
n_params = length(ϕ_true) # input dimension
n_outputs = length(truth_sample) # output dimension

Γy = ekiobj.obs_noise_cov
Γy = get_obs_noise_cov(ekiobj)

cases = [
"rf-scalar",
Expand Down Expand Up @@ -281,8 +281,7 @@ function main()
# Adding a vertical line for the true value
vlines!(ax, [ϕ_true[idx]], color = :indigo, linewidth = 2.6, label = "true " * param_names[idx])

xlims!(ax, xmin, xmax)
ylims!(ax, 0, nothing)


# Setting title and labels
ax.xlabel = "Value"
Expand Down
8 changes: 3 additions & 5 deletions examples/Cloudy/Project.toml
Original file line number Diff line number Diff line change
@@ -1,22 +1,20 @@
[deps]
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
CalibrateEmulateSample = "95e48a1f-0bec-4818-9538-3db4340308e3"
Cloudy = "9e3b23bb-e7cc-4b94-886c-65de2234ba87"
DifferentialEquations = "0c46a032-eb83-5123-abaf-570d42b7fbaa"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
DocStringExtensions = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
GaussianProcesses = "891a1506-143c-57d2-908e-e1f8e92e6de9"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
PairPlots = "43a3c2be-4208-490b-832a-a21dcd55d7da"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
CalibrateEmulateSample = "95e48a1f-0bec-4818-9538-3db4340308e3"
EnsembleKalmanProcesses = "aa8a2aa5-91d8-4396-bcef-d4f2ec43552d"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"


[compat]
Cloudy = "0.2"
julia = "~1.6"
Cloudy = "0.3"
2 changes: 1 addition & 1 deletion examples/Lorenz/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"

[compat]
FFTW = "= 1.3.0"
FFTW = "1.3"
julia = "~1.6"
12 changes: 5 additions & 7 deletions examples/Lorenz/calibrate.jl
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ function main()
###
### Define the data from which we want to learn the parameters
###
data_names = ["y0", "y1"]
data_names = ["y0_y1"]


###
Expand Down Expand Up @@ -254,8 +254,7 @@ function main()


# Construct observation object
truth = Observations.Observation(yt, Γy, data_names)
truth_sample = yt[:, end]
truth = EKP.Observation(Dict("samples" => vec(mean(yt, dims = 2)), "covariances" => Γy, "names" => data_names))
###
### Calibrate: Ensemble Kalman Inversion
###
Expand All @@ -271,8 +270,7 @@ function main()

ekiobj = EKP.EnsembleKalmanProcess(
initial_params,
truth_sample,
truth.obs_noise_cov,
truth,
EKP.Inversion(),
scheduler = EKP.DataMisfitController(),
verbose = true,
Expand Down Expand Up @@ -316,9 +314,9 @@ function main()
"eki",
ekiobj,
"truth_sample",
truth_sample,
EKP.get_obs(truth),
"truth_sample_mean",
truth.mean,
vec(mean(yt, dims = 2)),
"truth_input_constrained",
params_true, #constrained here, as these are in a physically constrained space (unlike the u inputs),
)
Expand Down
3 changes: 1 addition & 2 deletions examples/Lorenz/emulate_sample.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ using CalibrateEmulateSample.Utilities
using CalibrateEmulateSample.EnsembleKalmanProcesses
using CalibrateEmulateSample.ParameterDistributions
using CalibrateEmulateSample.DataContainers
using CalibrateEmulateSample.Observations

function get_standardizing_factors(data::Array{FT, 2}) where {FT}
# Input: data size: N_data x N_ensembles
Expand Down Expand Up @@ -95,7 +94,7 @@ function main()
truth_sample = load(data_save_file)["truth_sample"]
truth_params_constrained = load(data_save_file)["truth_input_constrained"] #true parameters in constrained space
truth_params = transform_constrained_to_unconstrained(priors, truth_params_constrained)
Γy = ekiobj.obs_noise_cov
Γy = get_obs_noise_cov(ekiobj)


n_params = length(truth_params) # "input dim"
Expand Down
4 changes: 2 additions & 2 deletions src/CalibrateEmulateSample.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ module CalibrateEmulateSample
using Distributions, Statistics, LinearAlgebra, DocStringExtensions

# imported modules from EKP.
import EnsembleKalmanProcesses: EnsembleKalmanProcesses, ParameterDistributions, Observations, DataContainers
import EnsembleKalmanProcesses: EnsembleKalmanProcesses, ParameterDistributions, DataContainers

export EnsembleKalmanProcesses, ParameterDistributions, Observations, DataContainers
export EnsembleKalmanProcesses, ParameterDistributions, DataContainers


# Internal deps, light external deps
Expand Down
5 changes: 2 additions & 3 deletions src/ScalarRandomFeature.jl
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ $(DocStringExtensions.TYPEDSIGNATURES)
gets the rng field
"""
get_rng(srfi::ScalarRandomFeatureInterface) = srfi.rng
EKP.get_rng(srfi::ScalarRandomFeatureInterface) = srfi.rng

"""
$(DocStringExtensions.TYPEDSIGNATURES)
Expand Down Expand Up @@ -447,8 +447,7 @@ function build_models!(
)
inflation = optimizer_options["inflation"]
if inflation > 0
terminated =
EKP.update_ensemble!(ekiobj, g_ens, additive_inflation = true, use_prior_cov = true, s = inflation) # small regularizing inflation
terminated = EKP.update_ensemble!(ekiobj, g_ens, additive_inflation = true, s = inflation) # small regularizing inflation
else
terminated = EKP.update_ensemble!(ekiobj, g_ens) # small regularizing inflation
end
Expand Down
29 changes: 0 additions & 29 deletions src/Utilities.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,11 @@ using LinearAlgebra
using Statistics
using StatsBase
using Random
using ..Observations
using ..EnsembleKalmanProcesses
EnsembleKalmanProcess = EnsembleKalmanProcesses.EnsembleKalmanProcess
using ..DataContainers

export get_training_points
export get_obs_sample
export orig2zscore
export zscore2orig
"""
Expand Down Expand Up @@ -50,33 +48,6 @@ function get_training_points(
return training_points
end


"""
$(DocStringExtensions.TYPEDSIGNATURES)
Return a random sample from the observations, for use in the MCMC.
- `rng` - optional RNG object used to pick random sample; defaults to `Random.GLOBAL_RNG`.
- `obs` - Observation struct with the observations (extract will pick one
of the sample observations to train).
- `rng_seed` - optional kwarg; if provided, used to re-seed `rng` before sampling.
"""
function get_obs_sample(
rng::Random.AbstractRNG,
obs::Observation;
rng_seed::Union{IT, Nothing} = nothing,
) where {IT <: Int}
# Ensuring reproducibility of the sampled parameter values:
# re-seed the rng *only* if we're given a seed
if rng_seed !== nothing
rng = Random.seed!(rng, rng_seed)
end
row_idxs = StatsBase.sample(rng, axes(obs.samples, 1), 1; replace = false, ordered = false)
return obs.samples[row_idxs...]
end
# first arg optional; defaults to GLOBAL_RNG (as in Random, StatsBase)
get_obs_sample(obs::Observation; kwargs...) = get_obs_sample(Random.GLOBAL_RNG, obs; kwargs...)

function orig2zscore(X::AbstractVector{FT}, mean::AbstractVector{FT}, std::AbstractVector{FT}) where {FT}
# Compute the z scores of a vector X using the given mean
# and std
Expand Down
5 changes: 2 additions & 3 deletions src/VectorRandomFeature.jl
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ $(DocStringExtensions.TYPEDSIGNATURES)
Gets the rng field
"""
get_rng(vrfi::VectorRandomFeatureInterface) = vrfi.rng
EKP.get_rng(vrfi::VectorRandomFeatureInterface) = vrfi.rng

"""
$(DocStringExtensions.TYPEDSIGNATURES)
Expand Down Expand Up @@ -572,8 +572,7 @@ function build_models!(
end
inflation = optimizer_options["inflation"]
if inflation > 0
terminated =
EKP.update_ensemble!(ekiobj, g_ens, additive_inflation = true, use_prior_cov = true, s = inflation) # small regularizing inflation
terminated = EKP.update_ensemble!(ekiobj, g_ens, additive_inflation = true, s = inflation) # small regularizing inflation
else
terminated = EKP.update_ensemble!(ekiobj, g_ens) # small regularizing inflation
end
Expand Down
1 change: 0 additions & 1 deletion test/RandomFeature/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ using CalibrateEmulateSample.Emulators
using CalibrateEmulateSample.DataContainers
using CalibrateEmulateSample.EnsembleKalmanProcesses
using CalibrateEmulateSample.ParameterDistributions
using RandomFeatures

seed = 10101010
rng = Random.MersenneTwister(seed)
Expand Down
5 changes: 0 additions & 5 deletions test/Utilities/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ using Statistics
using LinearAlgebra

using CalibrateEmulateSample.Utilities
using CalibrateEmulateSample.Observations
using CalibrateEmulateSample.EnsembleKalmanProcesses
using CalibrateEmulateSample.DataContainers

Expand All @@ -15,10 +14,6 @@ using CalibrateEmulateSample.DataContainers

arr = vcat([i * ones(3)' for i in 1:5]...)
arr_t = permutedims(arr, (2, 1))
data_names = ["d1", "d2", "d3"]
obs = Observation(arr_t, data_names) #data must be columns as default
sample = get_obs_sample(rng, obs)
@test sample == [5.0, 5.0, 5.0]

mean_arr = dropdims(mean(arr, dims = 1), dims = 1)
std_arr = dropdims(std(arr, dims = 1), dims = 1)
Expand Down

0 comments on commit b8338a0

Please sign in to comment.