Add Getting Started Vignette

epiforecasts · Nov 24, 2023 · 69bf48d · 69bf48d
1 parent b34a3a3
commit 69bf48d
Showing 1 changed file with 153 additions and 0 deletions.
diff --git a/vignettes/EpiNow2.Rmd b/vignettes/EpiNow2.Rmd
@@ -0,0 +1,153 @@
+---
+title: "Getting started with EpiNow2"
+output:
+  rmarkdown::html_vignette:
+    toc: false
+    number_sections: false
+bibliography: library.bib
+csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa-numeric-superscript-brackets.csl
+vignette: >
+  %\VignetteIndexEntry{Getting started with EpiNow2}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+## Quick start
+
+In the following section we give an overview of the simple use case for `epinow()` and `regional_epinow()`.
+
+The first step to using the package is to load it as follows.
+
+```{r, message = FALSE}
+library(EpiNow2)
+```
+
+### Reporting delays, incubation period and generation time
+
+Distributions can be supplied in two ways. First, by supplying delay data to `estimate_delay()`, where a subsampled bootstrapped lognormal will be fit to account for uncertainty in the observed data without being biased by changes in incidence (see `?EpiNow2::estimate_delay()`).
+
+Second, by specifying predetermined delays with uncertainty using `dist_spec()`. An arbitrary number of delay distributions are supported in `dist_spec()` with the most common use case likely to be an incubation period followed by a reporting delay (see `?EpiNow2::dist_spec()`).
+
+For example if data on the delay between onset and infection was available we could fit a distribution to it, using `estimate_delay()`, with appropriate uncertainty as follows (note this is a synthetic example),
+```{r, eval = FALSE}
+reporting_delay <- estimate_delay(
+  rlnorm(1000, log(2), 1),
+  max_value = 14, bootstraps = 1
+)
+```
+
+If data was not available we could instead specify an informed estimate of the likely delay
+using `dist_spec()`. To demonstrate, we choose a lognormal distribution with mean 2, standard deviation 1 (both with some uncertainty) and a maximum of 10. *This is just an example and unlikely to apply in any particular use case*.
+
+```{r}
+reporting_delay <- dist_spec(
+  mean = convert_to_logmean(2, 1),
+  mean_sd = 0.1,
+  sd = convert_to_logsd(2, 1),
+  sd_sd = 0.1,
+  max = 10,
+  dist = "lognormal"
+)
+reporting_delay
+```
+
+For the rest of this vignette, we will use inbuilt example literature estimates for the incubation period and generation time of Covid-19 (see [here](https://github.com/epiforecasts/EpiNow2/tree/main/data-raw) for the code that generates these estimates). *These distributions are unlikely to be applicable for your use case. We strongly recommend investigating what might be the best distributions to use in any given use case.*
+
+```{r}
+example_generation_time
+example_incubation_period
+```
+
+Now, to the functions.
+
+### [epinow()](https://epiforecasts.io/EpiNow2/reference/epinow.html)
+
+This function represents the core functionality of the package and includes results reporting, plotting, and optional saving. It requires a data frame of cases by date of report and the distributions defined above.
+
+Load example case data from `{EpiNow2}`.
+
+```{r}
+reported_cases <- example_confirmed[1:60]
+head(reported_cases)
+```
+
+Estimate cases by date of infection, the time-varying reproduction number, the rate of growth, and forecast these estimates into the future by 7 days. Summarise the posterior and return a summary table and plots for reporting purposes. If a `target_folder` is supplied results can be internally saved (with the option to also turn off explicit returning of results). Here we use the default model parameterisation that prioritises real-time performance over run-time or other considerations. For other formulations see the documentation for `estimate_infections()`.
+
+```{r, message = FALSE, warning = FALSE}
+estimates <- epinow(
+  reported_cases = reported_cases,
+  generation_time = generation_time_opts(example_generation_time),
+  delays = delay_opts(example_incubation_period + reporting_delay),
+  rt = rt_opts(prior = list(mean = 2, sd = 0.2)),
+  stan = stan_opts(cores = 4, control = list(adapt_delta = 0.99)),
+  verbose = interactive()
+)
+names(estimates)
+```
+
+Both summary measures and posterior samples are returned for all parameters in an easily explored format which can be accessed using `summary`. The default is to return a summary table of estimates for key parameters at the latest date partially supported by data. 
+
+```{r}
+knitr::kable(summary(estimates))
+```
+
+Summarised parameter estimates can also easily be returned, either filtered for a single parameter or for all parameters.
+
+```{r}
+head(summary(estimates, type = "parameters", params = "R"))
+```
+
+Reported cases are returned in a separate data frame in order to streamline the reporting of forecasts and for model evaluation.
+
+```{r}
+head(summary(estimates, output = "estimated_reported_cases"))
+```
+
+A range of plots are returned (with the single summary plot shown below). These plots can also be generated using the following `plot` method.
+
+```{r, dpi = 330, fig.width = 12, fig.height = 12, message = FALSE, warning = FALSE}
+plot(estimates)
+```
+
+
+### [regional_epinow()](https://epiforecasts.io/EpiNow2/reference/regional_epinow.html)
+
+The `regional_epinow()` function runs the `epinow()` function across multiple regions in
+an efficient manner.
+
+Define cases in multiple regions delineated by the region variable.
+
+```{r}
+reported_cases <- data.table::rbindlist(list(
+  data.table::copy(reported_cases)[, region := "testland"],
+  reported_cases[, region := "realland"]
+))
+head(reported_cases)
+```
+
+Calling `regional_epinow()` runs the `epinow()` on each region in turn (or in parallel depending on the settings used). Here we switch to using a weekly random walk rather than the full Gaussian process model giving us piecewise constant estimates by week.
+
+```{r, message = FALSE, warning = FALSE}
+estimates <- regional_epinow(
+  reported_cases = reported_cases,
+  generation_time = generation_time_opts(example_generation_time),
+  delays = delay_opts(example_incubation_period + reporting_delay),
+  rt = rt_opts(prior = list(mean = 2, sd = 0.2), rw = 7),
+  gp = NULL,
+  stan = stan_opts(cores = 4, warmup = 250, samples = 1000)
+)
+```
+
+Results from each region are stored in a `regional` list with across region summary measures and plots stored in a `summary` list. All results can be set to be internally saved by setting the `target_folder` and `summary_dir` arguments. Each region can be estimated in parallel using the `{future}` package (when in most scenarios `cores` should be set to 1). For routine use each MCMC chain can also be run in parallel (with `future` = TRUE) with a time out (`max_execution_time`) allowing for partial results to be returned if a subset of chains is running longer than expected. See the documentation for the `{future}` package for details on nested futures.
+
+Summary measures that are returned include a table formatted for reporting (along with raw results for further processing). Futures updated will extend the S3 methods used above to smooth access to this output.
+
+```{r}
+knitr::kable(estimates$summary$summarised_results$table)
+```
+
+A range of plots are again returned (with the single summary plot shown below).
+
+```{r, dpi = 330, fig.width = 12, fig.height = 12, message = FALSE, warning = FALSE}
+estimates$summary$summary_plot
+```