diff --git a/docs/source/development/pyrealm_build_data.md b/docs/source/development/pyrealm_build_data.md
index b59ec2b2..41035bb8 100644
--- a/docs/source/development/pyrealm_build_data.md
+++ b/docs/source/development/pyrealm_build_data.md
@@ -11,138 +11,92 @@ kernelspec:
   name: python3
 ---
 
-# The `pyrealm_build_data` package
-
-The `pyrealm` repository includes both the `pyrealm` package and the
-`pyrealm_build_data` package. The `pyrealm_build_data` package contains datasets that
-are used in the `pyrealm` build and testing process. This includes:
-
-* Example datasets that are used in the package documentation, such as simple spatial
-  datasets for showing the use of the P Model.
-* "Golden" datasets for regression testing `pyrealm` implementations against the outputs
-  of other implementations. These datasets will include a set of input data and then
-  output predictions from other implementations.
-* Datasets for providing profiling of `pyrealm` code and for benchmarking new versions
-  of the package code against earlier implementations to check for performance issues.
-
-Note that `pyrealm_build_data` is a source distribution only (`sdist`) component of
-`pyrealm`, so is not included in binary distributions (`wheel`) that are typically
-installed by end users. This means that files in `pyrealm_build_data` are not available
-if a user has simply used `pip install pyrealm`: please *do not* use
-`pyrealm_build_data` within the main `pyrealm` code.
-
-## Package contents
-
-The package is organised into submodules that reflect the data use or previous
-implementation.
-
-### The `bigleaf` submodule
-
-This submodule contains benchmark outputs from the `bigleaf` package in `R`, which has
-been used as the basis for core hygrometry functions. The `bigleaf_conversions.R` R
-script runs a set of test values through `bigleaf`. The first part of the file prints
-out some simple test values that have been used in package doctests and then the second
-part of the file generates more complex benchmarking inputs that are saved, along with
-`bigleaf` outputs as `bigleaf_test_values.json`.
-
-Running `bigleaf_conversions.R` requires an installation of R along with the `jsonlite`
-and `bigleaf` packages, and the script can then be run from within the submodule folder
-as:
-
-```sh
-Rscript bigleaf_conversions.R
-```
-
-### The `rpmodel` submodule
-
-This submodule contains benchmark outputs from the `rpmodel` package in `R`, which has
-been used as the basis for initial development of the standard P Model.
-
-#### Test inputs
+# The {mod}`~pyrealm_build_data` module
 
-The `generate_test_inputs.py` file defines a set of constants for running P Model
-calculations and then defines a set of scalar and array inputs for the forcing variables
-required to run the P Model. The array inputs are set of 100 values sampled randomly
-across the ranges of plausible forcing value inputs in order to benchmark the
-calculations of the P Model implementation. All of these values are stored in the
-`test_inputs.json` file.
-
-It requires `python` and the `numpy` package and can be run as:
-
-```sh
-python generate_test_inputs.py
+```{eval-rst}
+.. automodule:: pyrealm_build_data
+    :autosummary:
+    :members:
+    :special-members: __init__
 ```
 
-#### Simple `rpmodel` benchmarking
-
-The `test_outputs_rpmodel.R` contains R code to run the test input data set, and store
-the expected predictions from the `rpmodel` package as `test_outputs_rpmodel.json`. It
-requires an installation of `R` and the `rpmodel` package and can be run as:
+## The `bigleaf` submodule
 
-```sh
-Rscript test_outputs_rpmodel.R
+```{eval-rst}
+.. automodule:: pyrealm_build_data.bigleaf
+    :autosummary:
+    :members:
+    :special-members: __init__
 ```
 
-#### Global array test
+## The `community` submodule
 
-The remaining files in the submodule are intended to provide a global test dataset for
-benchmarking the use of `rpmodel` on a global time-series, so using 3 dimensional arrays
-with latitude, longitude and time coordinates. It is currently not used in testing
-because of issues with the `rpmodel` package in version 1.2.0. It may also be replaced
-in testing with the `uk_data` submodule, which is used as an example dataset in the
-documentation.
+```{eval-rst}
+.. automodule:: pyrealm_build_data.community
+    :autosummary:
+    :members:
+    :special-members: __init__
+```
 
-The files are:
+## The `rpmodel` submodule
 
-* pmodel_global.nc: An input global NetCDF file containing forcing variables at 0.5°
-  spatial resolution and for two time steps.
-* test_global_array.R: An R script to run `rpmodel` using the dataset.
-* rpmodel_global_gpp_do_ftkphio.nc: A NetCDF file containing `rpmodel` predictions using
- corrections for temperature effects on the `kphio` parameter.
-* rpmodel_global_gpp_no_ftkphio.nc: A NetCDF file containing `rpmodel` predictions with
-  fixed `kphio`.
+```{eval-rst}
+.. automodule:: pyrealm_build_data.rpmodel
+    :autosummary:
+    :members:
+    :special-members: __init__
+```
 
-To generate the predicted outputs again requires an R installation with the `rpmodel`
-package:
+## The `sandoval_kphio` submodule
 
-```sh
-Rscript test_global_array.R
+```{eval-rst}
+.. automodule:: pyrealm_build_data.sandoval_kphio
+    :autosummary:
+    :members:
+    :special-members: __init__
 ```
 
-### The `subdaily` submodule
+## The `splash` submodule
 
-At present, this submodule only contains a single file containing the predictions for
-the `BE_Vie` fluxnet site from the original implementation of the `subdaily` module,
-published in {cite}`mengoli:2022a`. Generating these predictions requires an
-installation of R and then code from the following repository:
+```{eval-rst}
+.. automodule:: pyrealm_build_data.splash
+    :autosummary:
+    :members:
+    :special-members: __init__
+```
 
-[https://github.com/GiuliaMengoli/P-model_subDaily](https://github.com/GiuliaMengoli/P-model_subDaily)
+## The `subdaily` submodule
 
-TODO - This submodule should be updated to include the required code along with the
-settings files and a runner script to reproduce this code. Or possibly to checkout the
-required code as part of a shell script.
+```{eval-rst}
+.. automodule:: pyrealm_build_data.subdaily
+    :autosummary:
+    :members:
+    :special-members: __init__
+```
 
-### The `t_model` submodule
+## The `t_model` submodule
 
-The `t_model.r` contains the original implementation of the T Model calculations in R
-{cite:p}`Li:2014bc`. The `rtmodel_test_outputs.r` script sources this file and then
-generates some simple bencmarking predictions, which are saved as `rtmodel_output.csv`.
+```{eval-rst}
+.. automodule:: pyrealm_build_data.t_model
+    :autosummary:
+    :members:
+    :special-members: __init__
+```
 
-To generate the predicted outputs again requires an R installation
+## The `two_leaf` submodule
 
-```sh
-Rscript rtmodel_test_outputs.r
+```{eval-rst}
+.. automodule:: pyrealm_build_data.two_leaf
+    :autosummary:
+    :members:
+    :special-members: __init__
 ```
 
-### The `uk_data` submodule
-
-This submodule contains the Python script `create_2D_uk_inputs.py`, which is used to
-generate the NetCDF output file `UK_WFDE5_FAPAR_2018_JuneJuly.nc`. This contains P Model
-forcings for the United Kingdom at 0.5° spatial resolution and hourly temporal
-resolution over 2 months (1464 temporal observations). It is used for demonstrating the
-use of the subdaily P Model.
+## The `uk_data` submodule
 
-The script is currently written with a hard-coded set of paths to key source data - the
-WFDE5 v2 climate data and a separate source of interpolated hourly fAPAR. This should
-probably be rewritten to generate reproducible content from publically available sources
-of these datasets.
+```{eval-rst}
+.. automodule:: pyrealm_build_data.uk_data
+    :autosummary:
+    :members:
+    :special-members: __init__
+```
diff --git a/pyrealm_build_data/__init__.py b/pyrealm_build_data/__init__.py
index 13c9caca..1b5e9bc0 100644
--- a/pyrealm_build_data/__init__.py
+++ b/pyrealm_build_data/__init__.py
@@ -10,6 +10,9 @@
 * Datasets for providing profiling of ``pyrealm`` code and for benchmarking new versions
   of the package code against earlier implementations to check for performance issues.
 
+The package is organised into submodules that reflect the data use or previous
+implementation.
+
 Note that ``pyrealm_build_data`` is a source distribution only (``sdist``) component of
 ``pyrealm``, so is not included in binary distributions (``wheel``) that are typically
 installed by end users. This means that files in ``pyrealm_build_data`` are not
diff --git a/pyrealm_build_data/community/__init__.py b/pyrealm_build_data/community/__init__.py
new file mode 100644
index 00000000..fc2675a9
--- /dev/null
+++ b/pyrealm_build_data/community/__init__.py
@@ -0,0 +1,5 @@
+"""The :mod:`pyrealm_build_data.community` submodule provides a set of input files for
+the :mod:`pyrealm.demography` module that are used both in unit testing for the module
+and as inputs for generating documentation of the module. The files provide definitions
+of plant functional types and plant communities in a range of formats.
+"""  # noqa: D205