From bda5aa319f51227c4b4de867a8a1c8e2802a94b1 Mon Sep 17 00:00:00 2001
From: mastoffel <martin.adam.stoffel@gmail.com>
Date: Thu, 21 Nov 2024 17:01:52 +0000
Subject: [PATCH 1/3] update faq-users

---
 docs/community/faq/faq-users.md | 59 ++++++++++++++-------------------
 1 file changed, 25 insertions(+), 34 deletions(-)

diff --git a/docs/community/faq/faq-users.md b/docs/community/faq/faq-users.md
index a6d9e464..bac5d7ec 100644
--- a/docs/community/faq/faq-users.md
+++ b/docs/community/faq/faq-users.md
@@ -4,36 +4,31 @@
 
 1. What is `AutoEmulate`?
    <!-- A brief description of what the package does, its main features, and its intended use case. -->
-   - A Python package that makes it easy to build emulators for complex simulations. It takes a set of simulation inputs `X` and outputs `y`, and automatically fits, optimises and evaluates various machine learning models to find the best emulator model. The emulator model can then be used as a drop-in replacement for the simulation, but will be much faster and computationally cheaper to evaluate. 
+   - A Python package that makes it easy to create emulators for complex simulations. It takes a set of simulation inputs `X` and outputs `y`, and automatically fits, optimises and evaluates various machine learning models to find the best emulator model. The emulator model can then be used as a drop-in replacement for the simulation, but will be much faster and computationally cheaper to evaluate. We have also implemented global sensitivity analysis as a common emulator application and working towards making `AutoEmulate` a true end-to-end package for building emulators.
 
-2. How do I install `AutoEmulate`?
-   <!-- Step-by-step instructions on installing the package, including any dependencies that might be required. -->
-   - See the [installation guide](../../getting-started/installation.md) for detailed instructions.
+2. How do I know whether `AutoEmulate` is the right tool for me?
+   - You need to build an emulator for a simulation.
+   - You want to do global sensitivity analysis
+   - Your inputs `X` and outputs `y` are numeric and complete (we don't support missing data yet).
+   - You have one or more input parameters and one or more output variables.
+   - You have a small-ish dataset in the order of hundreds to few thousands of samples. All default emulator parameters and search spaces are optimised for smaller datasets.
 
-3. What are the prerequisites for using `AutoEmulate`?
-   <!-- Information on the knowledge or data required to effectively use AutoEmulate, such as familiarity with Python, machine learning concepts, or specific data formats. -->
-   - `AutoEmulate` is designed to be easy to use. The user has to first generate a dataset of simulation inputs `X` and outputs `y`, and optimally have a basic understanding of Python and machine learning concepts.
+3. Does `AutoEmulate` support multi-output data?
+   - Yes, all models support multi-output data. Some do so natively, others are wrapped in a `MultiOutputRegressor`, which fits one model per target variable.
 
-## Usage Questions
-
-1. How do I start using `AutoEmulate` with my simulation?
-   <!-- A simple example to get a new user started, possibly pointing to more detailed tutorials or documentation. -->
-   - See the [getting started guide](../../getting-started/quickstart.ipynb) or a more [in-depth tutorial](../../tutorials/01_start.ipynb).
-
-2. What kind of data does `AutoEmulate` need to build an emulator?
-   <!-- Clarification on the types of datasets suitable for analysis, including data formats and recommended data sizes. -->
-
-   - `AutoEmulate` takes simulation inputs `X` and simulation outputs `y` to build an emulator.`X` is an ndarray of shape `(n_samples, n_parameters)` and `y` is an ndarray of shape `(n_samples, n_outputs)`. Each sample here is a simulation run, so each row of `X` corresponds to a set of input parameters and each row of `y` corresponds to the corresponding simulation output. Currently, all inputs and outputs should be numeric, and we don't support missing data.
+4. Does `AutoEmulate` support temporal or spatial data?
+   - Not explicitly. The train-test split just takes a random subset as a test set, so does KFold cross-validation.
 
-   - All models work with multi-output data. We have optimised `AutoEmulate` to work with smaller datasets (in the order of hundreds to thousands of samples). Training emulators with large datasets (hundreds of thousands of samples) may currently require a long time and is not recommended.
+5. Why is `AutoEmulate` so slow?
+   - The package fits a lot of models, in particular when hyperparameters are optimised. With say 8 default models and 5-fold cross-validation, this amounts to 40 model fits. With the addition of hyperparameter optimisation (n_iter=20), this results in 800 model fits. Some models such as Gaussian Processes and Neural Processes will take a long time to run on a CPU. However, don't despair! There is a [speeding up AutoEmulate guide](../../tutorials/02_speed.ipynb). As a rule of thumb, if your dataset is smaller than 1000 samples, you should be fine, if it's larger and you want to optimise hyperparameters, you might want to read the guide.
 
-3. How do I interpret the results from `AutoEmulate`?
-   <!-- Guidance on understanding the output of the software, including any metrics or visualizations it produces. -->
-   - See the [tutorial](../../tutorials/01_start.ipynb) for an example of how to interpret the results from `AutoEmulate`. Briefly, `X` and `y` are first split into training and test sets. Cross-validation and/or hyperparameter optimisation are performed on the training data. After comparing the results from different emulators, the user can evaluate the chosen emulator on the test set with `AutoEmulate.evaluate_model()`, and plot test set predictions with `AutoEmulate.plot_model()`, see [autoemulate.compare](../../reference/compare.rst) module for details.
+## Usage Questions
 
-   - An important thing to note is that the emulator can only be as good as the data it was trained on. Therefore, the experimental design (on which points the simulation was evaluated) is key to obtaining a good emulator.
+1. What data do I need to provide to `AutoEmulate` to build an emulator?
+   <!-- A simple example to get a new user started, possibly pointing to more detailed tutorials or documentation. -->
+   - You'll need two input objects: `X` and `y`. `X` is an ndarray / Pandas DataFrame of shape `(n_samples, n_parameters)` and `y` is an ndarray / Pandas DataFrame of shape `(n_samples, n_outputs)`. Each sample here is a simulation run, so each row of `X` corresponds to a set of input parameters and each row of `y` corresponds to the corresponding simulation output. You'll usually have created `X` using Latin hypercube sampling or similar methods, and `y` by running the simulation on these `X` inputs.
 
-4. Can I use `AutoEmulate` for commercial purposes?
+2. Can I use `AutoEmulate` for commercial purposes?
    <!-- Information on licensing and any restrictions on use. -->
    - Yes. It's licensed under the MIT license, which allows for commercial use. See the [license](../../../LICENSE) for more information.
 
@@ -41,28 +36,24 @@
 
 1. Does AutoEmulate support parallel processing or high-performance computing (HPC) environments?
    <!-- Details on the software's capabilities to leverage multi-threading, distributed computing, or HPC resources to speed up computations. -->
-   - Yes, [AutoEmulate.setup()](../../reference/compare.rst) has an `n_jobs` parameter which allows to parallelise cross-validation and hyperparameter optimisation.
+   - Yes, [AutoEmulate.setup()](../../reference/compare.rst) has an `n_jobs` parameter which allows to parallelise cross-validation and hyperparameter optimisation. We are also working on GPU support for some models.
 
 2. Can AutoEmulate be integrated with other data analysis or simulation tools?
    <!-- Information on APIs, file formats, or protocols that facilitate the integration of AutoEmulate with other software ecosystems. -->
-   - `AutoEmulate` takes simple `X` and `y` ndarrays as input, and returns emulator models that can be saved and loaded with `joblib`. All emulators are written as scikit learn estimators, so they can be used like any other scikit learn model in a pipeline.
+   - `AutoEmulate` takes simple `X` and `y` ndarrays as input, and returns emulators which are [scikit-learn estimators](https://scikit-learn.org/1.5/developers/develop.html), that can be saved and loaded, and used like any other scikit-learn model.
 
 ## Data Handling
 
 1. What are the best practices for data preprocessing before using `AutoEmulate`?
    <!-- Tips and recommendations on preparing data, including normalisation, dealing with missing values, or data segmentation. -->
-   - The user will typically run their simulation on a selected set of input parameters (-> experimental design) using a latin hypercube or other sampling method. `AutoEmulate` currently needs all inputs to be numeric and we don't support missing data. By default, `AutoEmulate` will scale the input data to zero mean and unit variance, and there's the option to do dimensionality reduction in `setup()`.
-
-2. How does AutoEmulate handle large datasets?
-   <!-- Advice on managing large-scale data analyses, potential memory management features, or ways to streamline processing. -->
-   - `AutoEmulate` is optimised to work with smaller datasets (in the order of hundreds to thousands of samples). Training emulators with large datasets (hundreds of thousands of samples) may currently require a long time and is not recommended. Emulators are created because it's expensive to evaluate the simulation, so we expect most users to have a relatively small dataset.
+   - The user will typically run their simulation on a selected set of input parameters (-> experimental design) using a latin hypercube or other sampling method. `AutoEmulate` currently needs all inputs to be numeric and we don't support missing data. By default, `AutoEmulate` will scale the input data to zero mean and unit variance, and for some models it will also scale the output data. There's also the option to do dimensionality reduction in `setup()`.
 
 ## Troubleshooting
 
 1. What common issues might I encounter when using `AutoEmulate`, and how can I solve them?
    <!-- A list of frequently encountered problems with suggested solutions, possibly linked to a more extensive troubleshooting guide. -->
    - `AutoEmulate.setup()` has a `log_to_file` option to log all warnings/errors to a file. It also has a `verbose` option to print more information to the console. If you encounter an error, please open an issue (see below).
-
+   - One common issue is that the Jupyter notebook kernel crashes when running `compare()` in parallel, often due to `LightGBM`. In this case, we recommend either specifying `n_jobs=1` or selecting specific (non-LightGBM) models in `setup()` with the `models` parameter.
 2. How can I report a bug or request a feature in `AutoEmulate`?
    <!-- Instructions on the proper channels for reporting issues or suggesting enhancements, including any templates or information to include. -->
    - You can report a bug or request a new feature through the [issue templates](https://github.com/alan-turing-institute/autoemulate/issues/new/choose) in our GitHub repository. Head on over there and choose one of the templates for your purpose and get started.
@@ -71,11 +62,11 @@
 
 1. Are there any community projects or collaborations using `AutoEmulate` I can join or learn from?
    <!-- Information on community-led projects, study groups, or collaborative research initiatives involving AutoEmulate. -->
-   - Reach out to Martin ([email](mailto:mstoffel@turing.ac.uk)) or Kalle ([email](mailto:kwesterline@turing.ac.uk)) for more information.
+   - Reach out to Martin ([email](mailto:mstoffel@turing.ac.uk)) or Sophie ([email](mailto:sarana@turing.ac.uk)) for more information.
 
 2. Where can I find tutorials or case studies on using `AutoEmulate`?
    <!-- Directions to comprehensive learning materials, such as video tutorials (if we want to record that), written guides, or published research papers using AutoEmulate. -->
-   - See the [tutorial](../../tutorials/01_start.ipynb) for a comprehensive guide on using the package.
+   - See the [tutorial](../../tutorials/01_start.ipynb) for a comprehensive guide on using the package. Case studies are coming soon.
 
 3. How can I stay updated on new releases or updates to AutoEmulate?
    <!-- Guidance on subscribing to newsletters when/if we will have that, community calls if we start that, following the project on social media if we want to create those platforms, or joining community forums/Slack once we have that ready... -->
@@ -83,4 +74,4 @@
 
 4. What support options are available if I need help with AutoEmulate?
    <!-- Overview of support resources, including documentation, community forums/Slack when we have that ready... -->
-   - Please open an issue or contact the maintainer ([email](mailto:mstoffel@turing.ac.uk)) directly.
+   - Please open an issue on GitHub or contact the maintainer ([email](mailto:mstoffel@turing.ac.uk)) directly.

From 09ac608f00db23e376006b854182332343879d23 Mon Sep 17 00:00:00 2001
From: mastoffel <martin.adam.stoffel@gmail.com>
Date: Thu, 21 Nov 2024 17:50:25 +0000
Subject: [PATCH 2/3] update contributors faq

---
 docs/community/faq/faq-contributors.md | 72 ++++++++++++++++----------
 1 file changed, 46 insertions(+), 26 deletions(-)

diff --git a/docs/community/faq/faq-contributors.md b/docs/community/faq/faq-contributors.md
index f251fea4..6ed1acb0 100644
--- a/docs/community/faq/faq-contributors.md
+++ b/docs/community/faq/faq-contributors.md
@@ -1,45 +1,65 @@
 # First-Time Contributors' Frequently Asked Questions
 
-**TODO**
+## Technical Questions
 
-## Getting Started
+1. How is the AutoEmulate project structured?
+   <!-- An introduction to the project's architecture and where contributors can find key components. -->
+   * The key component is the `AutoEmulate` class in `autoemulate/compare.py`, which is the main class for setting up and comparing emulators, visualising and summarising results, saving models, and applications such as sensitivity analysis.
+   * All other modules in `autoemulate/` are supporting modules for the main class, such as data splitting, model processing, hyperparameter searching, plotting, saving, etc.
+   * `autoemulate/emulators/` contains the emulator models, which are implemented as [scikit-learn estimators](https://scikit-learn.org/1.5/developers/develop.html). Architectures for deep learning models are in `autoemulate/emulators/neural_networks/`, which feed into the emulators via [skorch](https://skorch.readthedocs.io/en/latest/?badge=latest).
+   * Emulators need to be registered in the model registry in `autoemulate/emulators/__init__.py` to be available in `AutoEmulate`.
+   * `autoemulate/simulations/` contains simple example simulations.
+   * `tests/` contains tests for the package.
+   * `data/` contains example datasets.
+   * `docs/` contains the documentation source files. We use `jupyter-book` to build the documentation.
 
-1. How can I contribute to AutoEmulate?
-   <!-- Overview of the ways to contribute, from code to documentation, and how to get started. -->
+2. How do I set up my development environment for AutoEmulate?
+   <!-- Steps to configure a local development environment, including any necessary tools or dependencies. -->
+   * Ensure have poetry installed. If not, install it following the [official instructions](https://python-poetry.org/docs/).
+   * Fork and clone the repository.
 
-2. What are the guidelines for contributing code?
-   <!-- Information on coding standards, the pull request process, and how contributions are reviewed. -->
+   ```bash
+   git clone https://github.com/alan-turing-institute/autoemulate.git
+   cd autoemulate
+   ```
 
-3. How do I choose what to work on for my first contribution?
-   <!-- Guidance on identifying beginner-friendly issues, selecting tasks based on personal expertise, or areas of the project that need the most help. -->
+   * Install the dependencies:
 
-4. What coding standards and practices does AutoEmulate follow?
-   <!-- Information on coding conventions, documentation standards, and testing practices contributors should adhere to. -->
+   ```bash
+   poetry install
+   ```
 
-5. Are there any specific development tools or environments recommended for working on AutoEmulate?
-   <!-- Suggestions for IDEs, code editors, version control systems, or other tools that facilitate development and contribute to the project. -->
+   * If needed, enter the shell (optional when working using an IDE which recognises poetry environments):
 
-## Making Contributions
+   ```bash
+   poetry shell
+   ```
 
-1. How do I submit a contribution, and what is the review process?
-   <!-- Step-by-step guide on creating pull requests, what happens after submission, how contributions are reviewed, and typical timelines for feedback. -->
+3. How do I run tests for AutoEmulate?
+   <!-- Instructions on how to execute the project's test suite to ensure changes do not introduce regressions. -->
+   * We use `pytest` to run the tests. To run all tests:
 
-2. Can I contribute by writing documentation or tutorials, and how?
-   <!-- Details on how to contribute to the project's documentation, tutorial creation, or translation efforts, including style guides or templates to follow. -->
+   ```bash
+   pytest
+   ```
 
-3. What should I do if my pull request gets rejected or needs revision?
-   <!-- Advice on how to handle feedback on contributions, including how to make requested changes and resubmit for review. -->
+   * To run tests with print statements:
 
-## Technical Questions
+   ```bash
+   pytest -s
+   ```
 
-1. How is the AutoEmulate project structured?
-   <!-- An introduction to the project's architecture and where contributors can find key components. -->
+   * To run a specific test module:
 
-2. How do I set up my development environment for AutoEmulate?
-   <!-- Steps to configure a local development environment, including any necessary tools or dependencies. -->
+   ```bash
+   pytest tests/test_example.py
+   ```
 
-3. How do I run tests for AutoEmulate?
-   <!-- Instructions on how to execute the project's test suite to ensure changes do not introduce regressions. -->
+   * To run a specific test:
+
+   ```bash
+   pytest tests/test_example.py::test_function
+   ```
 
 ## Community and Support
 

From 4f0d247e27a81c730ee707699a7dd383ad0c0527 Mon Sep 17 00:00:00 2001
From: mastoffel <martin.adam.stoffel@gmail.com>
Date: Fri, 22 Nov 2024 09:24:46 +0000
Subject: [PATCH 3/3] update contributors faq

---
 docs/community/faq/faq-contributors.md | 43 +++++---------------------
 docs/getting-started/installation.md   | 36 ++++++++++++++-------
 2 files changed, 32 insertions(+), 47 deletions(-)

diff --git a/docs/community/faq/faq-contributors.md b/docs/community/faq/faq-contributors.md
index 6ed1acb0..c312a62d 100644
--- a/docs/community/faq/faq-contributors.md
+++ b/docs/community/faq/faq-contributors.md
@@ -6,7 +6,7 @@
    <!-- An introduction to the project's architecture and where contributors can find key components. -->
    * The key component is the `AutoEmulate` class in `autoemulate/compare.py`, which is the main class for setting up and comparing emulators, visualising and summarising results, saving models, and applications such as sensitivity analysis.
    * All other modules in `autoemulate/` are supporting modules for the main class, such as data splitting, model processing, hyperparameter searching, plotting, saving, etc.
-   * `autoemulate/emulators/` contains the emulator models, which are implemented as [scikit-learn estimators](https://scikit-learn.org/1.5/developers/develop.html). Architectures for deep learning models are in `autoemulate/emulators/neural_networks/`, which feed into the emulators via [skorch](https://skorch.readthedocs.io/en/latest/?badge=latest).
+   * `autoemulate/emulators/` contains the emulator models, which are implemented as [scikit-learn estimators](https://scikit-learn.org/1.5/developers/develop.html). Deep learning models have two main parts: The scikit-learn estimator interface in `autoemulate/emulators/` and the neural network architecture in `autoemulate/emulators/neural_networks/`.
    * Emulators need to be registered in the model registry in `autoemulate/emulators/__init__.py` to be available in `AutoEmulate`.
    * `autoemulate/simulations/` contains simple example simulations.
    * `tests/` contains tests for the package.
@@ -15,25 +15,7 @@
 
 2. How do I set up my development environment for AutoEmulate?
    <!-- Steps to configure a local development environment, including any necessary tools or dependencies. -->
-   * Ensure have poetry installed. If not, install it following the [official instructions](https://python-poetry.org/docs/).
-   * Fork and clone the repository.
-
-   ```bash
-   git clone https://github.com/alan-turing-institute/autoemulate.git
-   cd autoemulate
-   ```
-
-   * Install the dependencies:
-
-   ```bash
-   poetry install
-   ```
-
-   * If needed, enter the shell (optional when working using an IDE which recognises poetry environments):
-
-   ```bash
-   poetry shell
-   ```
+   See the 'Install using Poetry' section of the [installation](../../getting-started/installation.md) page.
 
 3. How do I run tests for AutoEmulate?
    <!-- Instructions on how to execute the project's test suite to ensure changes do not introduce regressions. -->
@@ -65,23 +47,12 @@
 
 1. Where can I ask questions if I'm stuck?
    <!-- Information on where to find support, such as community forums, chat channels, or mailing lists. -->
+   * We use [Discussion on GitHub](https://github.com/alan-turing-institute/autoemulate/discussions) for questions and general discussion.
 
-2. How does AutoEmulate handle contributions related to security issues?
-   <!-- Guidelines on reporting security vulnerabilities and how they are addressed by the project. -->
-
-3. Is there a code of conduct for contributors?
+2. Is there a code of conduct for contributors?
    <!-- Details on the project's code of conduct, expectations for respectful and constructive interaction, and how to report violations. -->
+   * Yes, it's [here](../code-of-conduct.md).
 
-4. How can I get involved in decision-making or project planning as a contributor?
+3. How can I get involved in decision-making or project planning as a contributor?
    <!-- Explanation of how the project governance works, ways to participate in project roadmap discussions, and opportunities for contributors to influence development priorities. -->
-
-## Beyond Code Contributions
-
-1. Can I contribute without coding, for example, through design, marketing, or community management?
-   <!-- Overview of non-code contribution opportunities, including outreach efforts, event organisation, or community moderation. -->
-
-2. How does the project recognise or reward contributions?
-   <!-- Information on acknowledgment of contributions through all-contributors. -->
-
-3. Are there regular meetings or forums where contributors can discuss the project?
-   <!-- Schedule and formats of any regular contributor meetings, forums for discussion, or channels for real-time communication among contributors. -->
+   * We use GitHub [Discussions](https://github.com/alan-turing-institute/autoemulate/discussions) for general discussion and [Issues](https://github.com/alan-turing-institute/autoemulate/issues) for project planning and development.
\ No newline at end of file
diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md
index 6ad0860e..d0120a6d 100644
--- a/docs/getting-started/installation.md
+++ b/docs/getting-started/installation.md
@@ -2,38 +2,52 @@
 
 `AutoEmulate` is a Python package that can be installed in a number of ways. In this section we will describe the main ways to install the package.
 
-## Install from PyPI
+## Install from GitHub
 
 This is the easiest way to install `AutoEmulate`.
 
-Currently, because we are in active development, you have to install the development version from GitHub:
+Currently, because we are in active development, it's recommended to install the development version from GitHub:
+
+```bash
+pip install git+https://github.com/alan-turing-institute/autoemulate.git
+```
+
+## Install from PyPI
+
+Once we have a release on PyPI, you can install the package from there:
 
 ```bash
-$ pip install git+https://github.com/alan-turing-institute/autoemulate.git
+pip install autoemulate
 ```
 
 ## Install using Poetry
 
-If you are a code contributor, you can also use [Poetry](https://python-poetry.org/)
+If you'd like to contribute to `AutoEmulate`, you can install the package using Poetry.
+
+* Ensure you have poetry installed. If not, install it following the [official instructions](https://python-poetry.org/docs/).
+
+* Fork the repository on GitHub by clicking the "Fork" button at the top right of the [AutoEmulate repository](https://github.com/alan-turing-institute/autoemulate)
+
+* Clone your forked repository:
 
 ```bash
-$ git clone https://github.com/alan-turing-institute/autoemulate.git
+git clone https://github.com/YOUR-USERNAME/autoemulate.git
 ```
 
 Navigate into the directory:
 
-```
-$ cd autoemulate
+```bash
+cd autoemulate
 ```
 
 Set up poetry:
 
-```
-$ poetry install
+```bash
+poetry install
 ```
 
 Enter the poetry shell:
 
-```
-$ poetry shell
+```bash
+poetry shell
 ```