From 69bf48df823dda128487029d986460b85b8feec1 Mon Sep 17 00:00:00 2001 From: James Azam Date: Fri, 24 Nov 2023 14:46:53 +0000 Subject: [PATCH] Add Getting Started Vignette --- vignettes/EpiNow2.Rmd | 153 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 vignettes/EpiNow2.Rmd diff --git a/vignettes/EpiNow2.Rmd b/vignettes/EpiNow2.Rmd new file mode 100644 index 000000000..f5704c9d8 --- /dev/null +++ b/vignettes/EpiNow2.Rmd @@ -0,0 +1,153 @@ +--- +title: "Getting started with EpiNow2" +output: + rmarkdown::html_vignette: + toc: false + number_sections: false +bibliography: library.bib +csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa-numeric-superscript-brackets.csl +vignette: > + %\VignetteIndexEntry{Getting started with EpiNow2} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +## Quick start + +In the following section we give an overview of the simple use case for `epinow()` and `regional_epinow()`. + +The first step to using the package is to load it as follows. + +```{r, message = FALSE} +library(EpiNow2) +``` + +### Reporting delays, incubation period and generation time + +Distributions can be supplied in two ways. First, by supplying delay data to `estimate_delay()`, where a subsampled bootstrapped lognormal will be fit to account for uncertainty in the observed data without being biased by changes in incidence (see `?EpiNow2::estimate_delay()`). + +Second, by specifying predetermined delays with uncertainty using `dist_spec()`. An arbitrary number of delay distributions are supported in `dist_spec()` with the most common use case likely to be an incubation period followed by a reporting delay (see `?EpiNow2::dist_spec()`). + +For example if data on the delay between onset and infection was available we could fit a distribution to it, using `estimate_delay()`, with appropriate uncertainty as follows (note this is a synthetic example), +```{r, eval = FALSE} +reporting_delay <- estimate_delay( + rlnorm(1000, log(2), 1), + max_value = 14, bootstraps = 1 +) +``` + +If data was not available we could instead specify an informed estimate of the likely delay +using `dist_spec()`. To demonstrate, we choose a lognormal distribution with mean 2, standard deviation 1 (both with some uncertainty) and a maximum of 10. *This is just an example and unlikely to apply in any particular use case*. + +```{r} +reporting_delay <- dist_spec( + mean = convert_to_logmean(2, 1), + mean_sd = 0.1, + sd = convert_to_logsd(2, 1), + sd_sd = 0.1, + max = 10, + dist = "lognormal" +) +reporting_delay +``` + +For the rest of this vignette, we will use inbuilt example literature estimates for the incubation period and generation time of Covid-19 (see [here](https://github.com/epiforecasts/EpiNow2/tree/main/data-raw) for the code that generates these estimates). *These distributions are unlikely to be applicable for your use case. We strongly recommend investigating what might be the best distributions to use in any given use case.* + +```{r} +example_generation_time +example_incubation_period +``` + +Now, to the functions. + +### [epinow()](https://epiforecasts.io/EpiNow2/reference/epinow.html) + +This function represents the core functionality of the package and includes results reporting, plotting, and optional saving. It requires a data frame of cases by date of report and the distributions defined above. + +Load example case data from `{EpiNow2}`. + +```{r} +reported_cases <- example_confirmed[1:60] +head(reported_cases) +``` + +Estimate cases by date of infection, the time-varying reproduction number, the rate of growth, and forecast these estimates into the future by 7 days. Summarise the posterior and return a summary table and plots for reporting purposes. If a `target_folder` is supplied results can be internally saved (with the option to also turn off explicit returning of results). Here we use the default model parameterisation that prioritises real-time performance over run-time or other considerations. For other formulations see the documentation for `estimate_infections()`. + +```{r, message = FALSE, warning = FALSE} +estimates <- epinow( + reported_cases = reported_cases, + generation_time = generation_time_opts(example_generation_time), + delays = delay_opts(example_incubation_period + reporting_delay), + rt = rt_opts(prior = list(mean = 2, sd = 0.2)), + stan = stan_opts(cores = 4, control = list(adapt_delta = 0.99)), + verbose = interactive() +) +names(estimates) +``` + +Both summary measures and posterior samples are returned for all parameters in an easily explored format which can be accessed using `summary`. The default is to return a summary table of estimates for key parameters at the latest date partially supported by data. + +```{r} +knitr::kable(summary(estimates)) +``` + +Summarised parameter estimates can also easily be returned, either filtered for a single parameter or for all parameters. + +```{r} +head(summary(estimates, type = "parameters", params = "R")) +``` + +Reported cases are returned in a separate data frame in order to streamline the reporting of forecasts and for model evaluation. + +```{r} +head(summary(estimates, output = "estimated_reported_cases")) +``` + +A range of plots are returned (with the single summary plot shown below). These plots can also be generated using the following `plot` method. + +```{r, dpi = 330, fig.width = 12, fig.height = 12, message = FALSE, warning = FALSE} +plot(estimates) +``` + + +### [regional_epinow()](https://epiforecasts.io/EpiNow2/reference/regional_epinow.html) + +The `regional_epinow()` function runs the `epinow()` function across multiple regions in +an efficient manner. + +Define cases in multiple regions delineated by the region variable. + +```{r} +reported_cases <- data.table::rbindlist(list( + data.table::copy(reported_cases)[, region := "testland"], + reported_cases[, region := "realland"] +)) +head(reported_cases) +``` + +Calling `regional_epinow()` runs the `epinow()` on each region in turn (or in parallel depending on the settings used). Here we switch to using a weekly random walk rather than the full Gaussian process model giving us piecewise constant estimates by week. + +```{r, message = FALSE, warning = FALSE} +estimates <- regional_epinow( + reported_cases = reported_cases, + generation_time = generation_time_opts(example_generation_time), + delays = delay_opts(example_incubation_period + reporting_delay), + rt = rt_opts(prior = list(mean = 2, sd = 0.2), rw = 7), + gp = NULL, + stan = stan_opts(cores = 4, warmup = 250, samples = 1000) +) +``` + +Results from each region are stored in a `regional` list with across region summary measures and plots stored in a `summary` list. All results can be set to be internally saved by setting the `target_folder` and `summary_dir` arguments. Each region can be estimated in parallel using the `{future}` package (when in most scenarios `cores` should be set to 1). For routine use each MCMC chain can also be run in parallel (with `future` = TRUE) with a time out (`max_execution_time`) allowing for partial results to be returned if a subset of chains is running longer than expected. See the documentation for the `{future}` package for details on nested futures. + +Summary measures that are returned include a table formatted for reporting (along with raw results for further processing). Futures updated will extend the S3 methods used above to smooth access to this output. + +```{r} +knitr::kable(estimates$summary$summarised_results$table) +``` + +A range of plots are again returned (with the single summary plot shown below). + +```{r, dpi = 330, fig.width = 12, fig.height = 12, message = FALSE, warning = FALSE} +estimates$summary$summary_plot +``` \ No newline at end of file