Skip to content

Commit

Permalink
update paper
Browse files Browse the repository at this point in the history
  • Loading branch information
mastoffel committed Dec 2, 2024
1 parent c31d1d8 commit 32db6f3
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ To understand complex real-world systems, researchers and engineers often constr

A typical emulation pipeline involves three steps: 1. Evaluating the simulation at a small, strategically chosen set of inputs using techniques such as Latin Hypercube Sampling [@mckay_comparison_1979] to create a representative dataset, 2. constructing a high-accuracy emulator using that dataset, which involves model selection, hyperparameter optimisation and evaluation and 3. applying the emulator to tasks such as prediction, sensitivity analysis, or optimisation. Building an emulator in particular requires substantial machine learning experience and knowledge of an ever increasing ecosystem of models and packages. This puts a substantial burden on practitioners whose main focus is to explore the underlying system, not building the emulator.

AutoEmulate automates emulator building, with the goal to eventually streamline the whole emulation pipeline. For people new to ML, AutoEmulate compares, optimises and evaluates a range of models to create an efficient emulator for their simulation in just a few lines of code. For experienced surrogate modellers, AutoEmulate provides a reference set of cutting-edge emulators to quickly benchmark new models against. The package includes classic emulators such as Radial Basis Functions and Gaussian Processes, established ML models like Gradient Boosting and Support Vector Machines, as well as experimental deep learning emulators such as [Conditional Neural Processes](https://yanndubs.github.io/Neural-Process-Family/text/Intro.html) [@garnelo_conditional_2018]. AutoEmulate is built to be extensible. Emulators follow the popular [scikit-learn estimator template](https://scikit-learn.org/1.5/developers/develop.html#rolling-your-own-estimator) and deep learning models are supported with little overhead using PyTorch [@paszke_pytorch_2019] with a skorch [@tietz_skorch_2017] interface.
AutoEmulate automates emulator building, with the goal to eventually streamline the whole emulation pipeline. For people new to ML, AutoEmulate compares, optimises and evaluates a range of models to create an efficient emulator for their simulation in just a few lines of code. For experienced surrogate modellers, AutoEmulate provides a reference set of cutting-edge emulators to quickly benchmark new models against. The package includes classic emulators such as Radial Basis Functions and Gaussian Processes, established ML models like Gradient Boosting and Support Vector Machines, as well as experimental deep learning emulators such as [Conditional Neural Processes](https://yanndubs.github.io/Neural-Process-Family/text/Intro.html) [@garnelo_conditional_2018]. AutoEmulate is built to be extensible. Emulators follow the popular [scikit-learn estimator template](https://scikit-learn.org/1.5/developers/develop.html#rolling-your-own-estimator) and PyTorch [@paszke_pytorch_2019] deep learning models are supported with little overhead using a [skorch](https://skorch.readthedocs.io/en/stable/) [@tietz_skorch_2017] interface.

AutoEmulate fills a gap in the current landscape of surrogate modeling tools as it’s both highly accessible for newcomers while providing cutting-edge emulators for experienced surrogate modelers. In contrast, existing libraries either focus on lower level implementations of specific models, like GPflow [@matthews_gpflow_2017] and GPytorch [@gardner_gpytorch_2018], or provide multiple emulators and applications but require to manually pre-process data, compare emulators and optimise hyperparameters like SMT in Python [@saves_smt_2024] or [Surrogates.jl](https://docs.sciml.ai/Surrogates/latest/) in Julia.

Expand Down Expand Up @@ -93,12 +93,12 @@ After comparing cross-validation metrics and plots, an emulator can be selected
```python
emulator = ae.get_model("GaussianProcess") # select fitted emulator
ae.evaluate(emulator) # calculate test set scores
ae.plot_eval(emulator, input_index=[0, 1]) # visualise test set predictions
ae.plot_eval(emulator, input_index=[0, 1]) # plot predictions
```

![Test set predictions for each input](eval_2.png)

Finally, the emulator can be refitted on the combined training and test set data before applying it. It's now ready to be used as an efficient replacement for the original simulation, and is able to generate tens of thousands of new data points in negligible time using predict(). We also implemented global sensitivity analysis as a common emulator application, which decomposes the variance in the outputs into the contributions of the various simulation parameters and their interactions.
Finally, the emulator can be refitted on the combined training and test set data before applying it. It's now ready to be used as an efficient replacement for the original simulation, being able to generate tens of thousands of new data points in negligible time using predict(). Lastly, we implemented global sensitivity analysis, a common emulator application, which quantifies how each simulation parameter and their interactions contribute to the output variance.

```python
emulator = ae.refit(emulator) # refit using full data
Expand Down

0 comments on commit 32db6f3

Please sign in to comment.