Skip to content

Commit

Permalink
update paper
Browse files Browse the repository at this point in the history
  • Loading branch information
mastoffel committed Nov 29, 2024
1 parent 5ca9376 commit 7b61f34
Showing 1 changed file with 14 additions and 15 deletions.
29 changes: 14 additions & 15 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,22 +54,21 @@ AutoEmulate automates emulator building, with the goal to eventually streamline

# Pipeline

The inputs for AutoEmulate are X and y, where X is a 2D array (e.g. numpy-array, Pandas DataFrame) containing one simulation parameter per column and their values in rows, and y is an array containing the corresponding simulation outputs. A dataset X, y is usually constructed by sampling input parameters X using Latin Hypercube Sampling (McKay et al., 1979) and evaluating the simulation on these inputs to obtain outputs y. With X and y, we can create an emulator with AutoEmulate in just a few lines of code.
The inputs for AutoEmulate are X and y, where X is a 2D array (e.g. numpy-array, Pandas DataFrame) containing one simulation parameter per column and their values in rows, and y is an array containing the corresponding simulation outputs. A dataset X, y is usually constructed by sampling input parameters X using Latin Hypercube Sampling [@mckay_comparison_1979] and evaluating the simulation on these inputs to obtain outputs y. With X and y, we can create an emulator with AutoEmulate in just a few lines of code.

```python
from autoemulate.compare import AutoEmulate

ae = AutoEmulate()
ae.setup(X, y) # allows to customise pipeline
ae.compare() # runs the pipeline
ae.setup(X, y) # customise pipeline
ae.compare() # runs the pipeline
```

Under the hood, AutoEmulate runs a complete ML pipeline. It splits the data into training and test sets, standardises inputs, fits a set of user-specified emulators, compares them using cross-validation and optionally optimises hyperparameters using pre-defined search spaces. The cross-validation results can then be visualised and summarised.
.
Under the hood, AutoEmulate runs a complete ML pipeline. It splits the data into training and test sets, standardises inputs, fits a set of user-specified emulators, compares them using cross-validation and optionally optimises hyperparameters using pre-defined search spaces. All these steps can be customised in `setup()`. After running `compare()`, the cross-validation results can be visualised and summarised.

```python
ae.plot_cv() # visualise results
ae.summarise_cv() # metrics for each model
ae.plot_cv() # visualise results
ae.summarise_cv() # metrics for each model
```

: Average cross-validation scores
Expand All @@ -85,22 +84,22 @@ ae.summarise_cv() # metrics for each model
| LightGBM | lgbm | 0.6044 | 0.4930 |
| Second Order Polynomial | sop | 0.8378 | 0.0297 |

After choosing an emulator based on cross-validation metrics and visualisations, it can be evaluated on the test set, which defaults to be 20% of the original dataset.
After choosing an emulator based on cross-validation metrics and visualisations, it can be evaluated on the test set, with a default size of 20% of the original dataset.

```python
emulator = ae.get_model("GaussianProcess")
ae.evaluate(emulator) # calculate test set scores
ae.plot_eval(emulator) # visualise test set predictions
ae.evaluate(emulator) # calculate test set scores
ae.plot_eval(emulator) # visualise test set predictions
```

![Test set predictions](eval_3.png)
![Test set predictions](eval_2.png)

If the test-set performance is acceptable, the emulator can be refitted on the combined training and test data before applying it. It's now ready to be used as an efficient replacement for the original simulation by generating tens of thousands of new data points in seconds using predict(). We have also implemented global sensitivity analysis, a common use-case for emulators, which decomposes the variance in the outputs into the contributions of the various simulation parameters and their interactions.
If the test-set performance is acceptable, the emulator can be refitted on the combined training and test data before applying it. It's now ready to be used as an efficient replacement for the original simulation, allowing to generate tens of thousands of new data points in seconds using predict(). We implemented global sensitivity analysis as a common use-case, which decomposes the variance in the outputs into the contributions of the various simulation parameters and their interactions.

```python
emulator = ae.refit(emulator) # refit using full data
emulator.predict(X) # efficiently generate new samples
ae.sensitivity_analysis(emulator) # global SA with Sobol indices
emulator = ae.refit(emulator) # refit using full data
emulator.predict(X) # emulate!
ae.sensitivity_analysis(emulator) # global SA with Sobol indices
```

# Acknowledgements
Expand Down

0 comments on commit 7b61f34

Please sign in to comment.