Skip to content

Commit

Permalink
examples section and more notes
Browse files Browse the repository at this point in the history
  • Loading branch information
odunbar committed Oct 11, 2023
1 parent 66b4b0e commit 53062ec
Showing 1 changed file with 16 additions and 10 deletions.
26 changes: 16 additions & 10 deletions docs/src/random_feature_emulator.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
!!! note "Have a go with Gaussian processes first"
We recommend that users first try `GaussianProcess` for their problems. As random features are a more recent tool, the training procedures and interfaces are still experimental and in development.

Random features provide a flexible framework to approximates a Gaussian process. Using random sampling of features to approximate a low-rank factorization of the Gaussian process kernel, the method is scalable (in numbers of training points, input-output dimensions). In the infinite sample limit, it is known how random features with certain given feature distributions converge to known kernel families.
Random features provide a flexible framework to approximates a Gaussian process. Using random sampling of features, the method is a low-rank approximation leading to advantageous scaling properties (with the number of training points, input, and output dimensions). In the infinite sample limit, there are often (known) explicit Gaussian process kernels that the random feature representation converges to.

We provide two types of `MachineLearningTool` for RandomFeatures, the `ScalarRandomFeatureInterface` and the `VectorRandomFeatureInterface`.
We provide two types of `MachineLearningTool` for random feature emulation, the `ScalarRandomFeatureInterface` and the `VectorRandomFeatureInterface`.

The `ScalarRandomFeatureInterface` closely mimics the role of a `GaussianProcess` package, by training a scalar-output function distribution. It can be applied to multidimensional output problems as with `GaussianProcess` by relying on a decorrelation of the output space, followed by training a series of independent scalar functions (all computed internally using the `Emulator` object).
The `ScalarRandomFeatureInterface` closely mimics the role of a `GaussianProcess` package, by training a scalar-output function distribution. It can be applied to multidimensional output problems (as with `GaussianProcess`) by relying on data processing tools, such as performed when the `decorrelate=true` keyword argument is provided to the `Emulator`.

The `VectorRandomFeatureInterface`, when applied to multidimensional problems, directly trains a function distribution between multi-dimensional spaces. This approach is not restricted to the data processing of the scalar method (though this can still be helpful). It can be cheaper to evaluate, but on the other hand the training can be more challenging/computationally expensive.

Expand Down Expand Up @@ -55,7 +55,7 @@ scalar_default_kernel = SeparableKernel(LowRankFactor(Int(ceil(sqrt(input_dim)))
vector_default_kernel = SeparableKernel(LowRankFactor(Int(ceil(sqrt(output_dim)))), LowRankFactor(Int(ceil(sqrt(output_dim)))))
```
!!! note "Relating covariance structure and training"
The parallels between random feature and gaussian process also extends to the hyperparameter learning. For example,
The parallels between random feature and Gaussian process also extends to the hyperparameter learning. For example,
- A `ScalarRandomFeatureInterface` with a `DiagonalFactor` input covariance structure approximates a Gaussian process with automatic relevance determination (ARD) kernel, where one learns a lengthscale in each dimension of the input space

## The `optimizer_options` keyword - for performance
Expand Down Expand Up @@ -92,7 +92,8 @@ We suggest looking at the [`EnsembleKalmanProcesses`](https://github.com/CliMA/E
- If `n_e` becomes less than the number of hyperparameters, the updates will fail and a localizer must be specified in `loc`.
- If the algorithm terminates at `T=1` and resulting emulators looks unacceptable one can change or add arguments in `sch` e.g. `DataMisfitController("on_terminate"=continue)`

widely robust defaults here are a work in progress
!!! note
Widely robust defaults here are a work in progress

## Key methods

Expand All @@ -101,14 +102,16 @@ To interact with the kernel/covariance structures we have standard `get_*` metho
- `calculate_n_hyperparameters(in_dim, out_dim, kernel_structure)` calculates the number of hyperparameters created by using the given kernel structure (can be applied to the covariance structure individually too)
- `build_default_priors(in_dim, out_dim, kernel_structure)` creates a `ParameterDistribution` for the hyperparameters based on the kernel structure. This serves as the initialization of the training procedure.

### Example: 5D-to-1D at defaults (scalar)
## Example families and their hyperparameters

### Scalar: ``\mathbb{R}^5 \to \mathbb{R}`` at defaults
```julia
using CalibrateEmulateSample.Emulators
input_dim = 5
# build the default scalar kernel directly (here it will be a rank-3 perturbation from the identity)
scalar_default_kernel = SeparableKernel(
cov_structure_from_string("lowrank", input_dim),
cov_structure_from_string("onedim",1)
cov_structure_from_string("onedim", 1)
)

calculate_n_hyperparameters(input_dim, scalar_default_kernel)
Expand All @@ -120,7 +123,7 @@ build_default_prior(input_dim, scalar_default_kernel)
# 15-dim unbounded distribution 'input_lowrank_U'
# 1-dim positive distribution `sigma`
```
### Example 25D to 50D at defaults (vector)
### Vector, separable: ``\mathbb{R}^{25} \to \mathbb{R}^{50}`` at defaults
Or take a diagonalized 8-dimensional input, and assume full 6-dimensional output

```julia
Expand All @@ -145,8 +148,11 @@ build_default_prior(input_dim, output_dim, vector_default_kernel)
# 1-dim postive distribution `sigma`
```

### Example 25D to 50D (nonseparable vector)
We see how the low-rank kernels are useful, when investigating the most general kernel case. The following is far too general, leading to large numbers of hyperparameters
### Vector, nonseparable: ``\mathbb{R}^{25} \to \mathbb{R}^{50}``
The following represents the most general kernel case.

!!! note "Use low-rank/diagonls representations where possible"
The following is far too general, leading to large numbers of hyperparameters
```julia
using CalibrateEmulateSample.Emulators
input_dim = 25
Expand Down

0 comments on commit 53062ec

Please sign in to comment.