diff --git a/vignettes/estimate_infections_options.Rmd b/vignettes/estimate_infections_options.Rmd new file mode 100644 index 000000000..a44e47317 --- /dev/null +++ b/vignettes/estimate_infections_options.Rmd @@ -0,0 +1,648 @@ +--- +title: "Examples: estimate_infections()" +output: + rmarkdown::html_vignette: + toc: true + number_sections: true +bibliography: library.bib +csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa-numeric-superscript-brackets.csl +vignette: > + %\VignetteIndexEntry{Examples: estimate_infections()} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + + + +The `estimate_infections()` function encodes a range of different model options. +In this vignette we apply some of these to the example data provided with the _EpiNow2_ package, highlighting differences in inference results and run times. +It is not meant as a comprehensive exploration of all the functionality in the package, but intended to give users a flavour of the kind of model options that exist for reproduction number estimation and forecasting within the package, and the differences in computational speed between them. +For mathematical detail on the model please consult the [model definition](estimate_infections.html) vignette, and for a more general description of the use of the function, the [estimate_infections workflow](estimate_infections_workflow.html) vignette. + +# Set up + +We first load the _EpiNow2_ package and also the _rstan_ package that we will use later to show the differences in run times between different model options. + + +```r +library("EpiNow2") +library("rstan") +#> Loading required package: StanHeaders +#> +#> rstan version 2.26.23 (Stan version 2.26.1) +#> For execution on a local, multicore CPU with excess RAM we recommend calling +#> options(mc.cores = parallel::detectCores()). +#> To avoid recompilation of unchanged Stan programs, we recommend calling +#> rstan_options(auto_write = TRUE) +#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions, +#> change `threads_per_chain` option: +#> rstan_options(threads_per_chain = 1) +``` + +We then set the number of cores to use. We will want to run 4 MCMC chains in parallel so we set this to 4. If we had fewer than 4 available or wanted to run fewer than 4 chains (at the expense of some robustness), or had fewer than 4 computing cores available we could set it to that. To find out the number of cores available one can use the [detectCores](https://rdrr.io/r/parallel/detectCores.html) function from the `parallel` package. + + +```r +options(mc.cores = 4) +``` + +# Data + +As data set we will use an example data set that is included in the package, representing an outbreak of COVID-19 with an initial rapid increase followed by decreasing incidence. + + +```r +library("ggplot2") +reported_cases <- example_confirmed[1:60] +ggplot(reported_cases, aes(x = date, y = confirm)) + + geom_col() + + theme_minimal() + + xlab("Date") + + ylab("Cases") +``` + +![plot of chunk data](figure/data-1.png) + +# Parameters + +Before running the model we need to decide on some parameter values, in particular any delays, the generation time, and a prior on the initial reproduction number. + +## Delays: incubation period and reporting delay + +Delays reflect the time that passes between infection and reporting, if these exist. In this example, we assume two delays, an _incubation period_ (i.e. delay from infection to symptom onset) and a _reporting delay_ (i.e. the delay from symptom onset to being recorded as a symptomatic case). These delays are usually not the same for everyone and are instead characterised by a distribution. For the incubation period we use an example from the literature that is included in the package. + + +```r +incubation_period <- get_incubation_period( + disease = "SARS-CoV-2", source = "lauer" +) +incubation_period +#> +#> Uncertain lognormal distribution with (untruncated) logmean 1.6 (SD 0.064) and logSD 0.42 (SD 0.069) +``` + +For the reporting delay, we use a lognormal distribution with mean of 2 days and +standard deviation of 1 day. + + +```r +reporting_delay <- dist_spec( + mean = convert_to_logmean(2, 1), mean_sd = 0, + sd = convert_to_logsd(2, 1), sd_sd = 0, max = 10 +) +reporting_delay +#> +#> Fixed distribution with PMF [0.11 0.48 0.27 0.093 0.029 0.0096 0.0033 0.0012 0.00045 0.00018] +``` + +We can combine these delays into one by summing them up + + +```r +delay <- incubation_period + reporting_delay +delay +#> +#> Combination of delay distributions: +#> Uncertain lognormal distribution with (untruncated) logmean 1.6 (SD 0.064) and logSD 0.42 (SD 0.069) +#> Fixed distribution with PMF [0.11 0.48 0.27 0.093 0.029 0.0096 0.0033 0.0012 0.00045 0.00018] +``` + +## Generation time + +If we want to estimate the reproduction number we need to provide a distribution of generation times. Here again we use an example from the literature that is included with the package. + + +```r +generation_time <- get_generation_time( + disease = "SARS-CoV-2", source = "ganyani" +) +generation_time +#> +#> Uncertain gamma distribution with (untruncated) mean 3.6 (SD 0.71) and SD 3.1 (SD 0.77) +``` + +## Initial reproduction number + +Lastly we need to choose a prior for the initial value of the reproduction number. This is assumed by the model to be normally distributed and we can set the mean and the standard deviation. We decide to set the mean to 2 and the standard deviation to 1. + + +```r +rt_prior <- list(mean = 2, sd = 0.1) +``` + +# Running the model + +We are now ready to run the model and will in the following show a number of possible options for doing so. + +## Default options + +By default the model uses a renewal equation for infections and a Gaussian Process prior for the reproduction number. +Putting all the data and parameters together and tweaking the Gaussian Process to have a shorter length scale prior than the default we run. + + +```r +def <- estimate_infections(reported_cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + rt = rt_opts(prior = rt_prior) +) +#> Warning: There were 5 divergent transitions after warmup. See +#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup +#> to find out why this is a problem and how to eliminate them. +#> Warning: Examine the pairs() plot to diagnose sampling problems +# summarise results +summary(def) +#> measure estimate +#> 1: New confirmed cases by infection date 2263 (1103 -- 4313) +#> 2: Expected change in daily cases Likely decreasing +#> 3: Effective reproduction no. 0.88 (0.61 -- 1.2) +#> 4: Rate of growth -0.027 (-0.099 -- 0.038) +#> 5: Doubling/halving time (days) -26 (18 -- -7) +# elapsed time (in seconds) +get_elapsed_time(def$fit) +#> warmup sample +#> chain:1 26.884 24.012 +#> chain:2 23.876 24.436 +#> chain:3 28.895 22.938 +#> chain:4 31.179 23.949 +# summary plot +plot(def) +``` + +![plot of chunk default](figure/default-1.png) + +## Reducing the accuracy of the approximate Gaussian Process + +To speed up the calculation of the Gaussian Process we could decrease its accuracy, e.g. decrease the proportion of time points to use as basis functions from the default of 0.2 to 0.1. + + +```r +agp <- estimate_infections(reported_cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + rt = rt_opts(prior = rt_prior), + gp = gp_opts(basis_prop = 0.1) +) +#> Warning: There were 6 divergent transitions after warmup. See +#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup +#> to find out why this is a problem and how to eliminate them. +#> Warning: Examine the pairs() plot to diagnose sampling problems +# summarise results +summary(agp) +#> measure estimate +#> 1: New confirmed cases by infection date 2317 (1199 -- 4352) +#> 2: Expected change in daily cases Likely decreasing +#> 3: Effective reproduction no. 0.89 (0.63 -- 1.2) +#> 4: Rate of growth -0.024 (-0.091 -- 0.037) +#> 5: Doubling/halving time (days) -29 (19 -- -7.6) +# elapsed time (in seconds) +get_elapsed_time(agp$fit) +#> warmup sample +#> chain:1 23.946 28.330 +#> chain:2 19.281 26.257 +#> chain:3 22.943 20.861 +#> chain:4 21.291 21.736 +# summary plot +plot(agp) +``` + +![plot of chunk lower_accuracy](figure/lower_accuracy-1.png) + +## Adjusting for future susceptible depletion + +We might want to adjust for future susceptible depletion. +Here, we do so by setting the population to 1000000 and projecting the reproduction number from the latest estimate (rather than the default, which fixes the reproduction number to an earlier time point based on the given reporting delays). +Note that this only affects the forecasts and is done using a crude adjustment (see the [model definition](estimate_infections.html)). + + +```r +dep <- estimate_infections(reported_cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + rt = rt_opts( + prior = rt_prior, + pop = 1000000, future = "latest" + ) +) +#> Warning: There were 10 divergent transitions after warmup. See +#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup +#> to find out why this is a problem and how to eliminate them. +#> Warning: Examine the pairs() plot to diagnose sampling problems +# summarise results +summary(dep) +#> measure estimate +#> 1: New confirmed cases by infection date 2288 (1129 -- 4339) +#> 2: Expected change in daily cases Likely decreasing +#> 3: Effective reproduction no. 0.88 (0.62 -- 1.2) +#> 4: Rate of growth -0.026 (-0.096 -- 0.036) +#> 5: Doubling/halving time (days) -27 (19 -- -7.2) +# elapsed time (in seconds) +get_elapsed_time(dep$fit) +#> warmup sample +#> chain:1 29.140 24.415 +#> chain:2 28.424 24.141 +#> chain:3 31.704 24.037 +#> chain:4 34.415 25.034 +# summary plot +plot(dep) +``` + +![plot of chunk susceptible_depletion](figure/susceptible_depletion-1.png) + +## Adjusting for truncation of the most recent data + +We might further want to adjust for right-truncation of recent data estimated using the [estimate_truncation](estimate_truncation.html) model. +Here, instead of doing so we assume that we know about truncation with mean of 1/2 day, sd 1/2 day, following a lognormal distribution and with a maximum of three days. + + +```r +trunc_dist <- dist_spec( + mean = convert_to_logmean(0.5, 0.5), mean_sd = 0.1, + sd = convert_to_logsd(0.5, 0.5), sd_sd = 0.1, + max = 3 +) +``` + +We can then use this in the `esimtate_infections()` function using the `truncation` option. + + +```r +trunc <- estimate_infections(reported_cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + truncation = trunc_opts(trunc_dist), + rt = rt_opts(prior = rt_prior) +) +#> Warning: There were 4 divergent transitions after warmup. See +#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup +#> to find out why this is a problem and how to eliminate them. +#> Warning: Examine the pairs() plot to diagnose sampling problems +# summarise results +summary(trunc) +#> measure estimate +#> 1: New confirmed cases by infection date 2527 (1235 -- 4693) +#> 2: Expected change in daily cases Likely decreasing +#> 3: Effective reproduction no. 0.92 (0.64 -- 1.2) +#> 4: Rate of growth -0.019 (-0.088 -- 0.043) +#> 5: Doubling/halving time (days) -37 (16 -- -7.9) +# elapsed time (in seconds) +get_elapsed_time(trunc$fit) +#> warmup sample +#> chain:1 24.744 27.070 +#> chain:2 34.858 26.868 +#> chain:3 34.714 46.675 +#> chain:4 26.364 23.112 +# summary plot +plot(trunc) +``` + +![plot of chunk truncation](figure/truncation-1.png) + +## Projecting the reproduction number with the Gaussian Process + +Instead of keeping the reproduction number fixed from a certain time point we might want to extrapolate the Gaussian Process into the future. +This will lead to wider uncertainty, and the researcher should check whether this or fixing the reproduction number from an earlier is desirable. + + +```r +project_rt <- estimate_infections(reported_cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + rt = rt_opts( + prior = rt_prior, future = "project" + ) +) +#> Warning: There were 11 divergent transitions after warmup. See +#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup +#> to find out why this is a problem and how to eliminate them. +#> Warning: Examine the pairs() plot to diagnose sampling problems +# summarise results +summary(project_rt) +#> measure estimate +#> 1: New confirmed cases by infection date 2300 (1125 -- 4607) +#> 2: Expected change in daily cases Likely decreasing +#> 3: Effective reproduction no. 0.89 (0.62 -- 1.2) +#> 4: Rate of growth -0.025 (-0.096 -- 0.042) +#> 5: Doubling/halving time (days) -27 (16 -- -7.2) +# elapsed time (in seconds) +get_elapsed_time(project_rt$fit) +#> warmup sample +#> chain:1 29.487 22.629 +#> chain:2 36.470 41.198 +#> chain:3 28.225 30.417 +#> chain:4 28.653 45.243 +# summary plot +plot(project_rt) +``` + +![plot of chunk gp_projection](figure/gp_projection-1.png) + +## Fixed reproduction number + +We might want to estimate a fixed reproduction number, i.e. assume that it does not change. + + +```r +fixed <- estimate_infections(reported_cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + gp = NULL +) +# summarise results +summary(fixed) +#> measure estimate +#> 1: New confirmed cases by infection date 15574 (9027 -- 27494) +#> 2: Expected change in daily cases Increasing +#> 3: Effective reproduction no. 1.2 (1.1 -- 1.2) +#> 4: Rate of growth 0.037 (0.026 -- 0.049) +#> 5: Doubling/halving time (days) 18 (14 -- 26) +# elapsed time (in seconds) +get_elapsed_time(fixed$fit) +#> warmup sample +#> chain:1 1.936 1.209 +#> chain:2 1.812 1.190 +#> chain:3 1.712 0.741 +#> chain:4 1.726 0.883 +# summary plot +plot(fixed) +``` + +![plot of chunk fixed](figure/fixed-1.png) + +## Breakpoints + +Instead of assuming the reproduction number varies freely or is fixed, we can assume that it is fixed but with breakpoints. +This can be done by adding a `breakpoint` column to the reported case data set. +e.g. if we think that the reproduction number was constant but would like to allow it to change on the 16th of March 2020 we would define a new case data set using + + +```r +bp_cases <- data.table::copy(reported_cases) +bp_cases <- bp_cases[, + breakpoint := ifelse(date == as.Date("2020-03-16"), 1, 0) +] +``` + +We then use this instead of `reported_cases` in the `estimate_infections()` function: + + +```r +bkp <- estimate_infections(bp_cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + rt = rt_opts(prior = rt_prior), + gp = NULL +) +# summarise results +summary(bkp) +#> measure estimate +#> 1: New confirmed cases by infection date 2475 (2061 -- 3030) +#> 2: Expected change in daily cases Decreasing +#> 3: Effective reproduction no. 0.91 (0.88 -- 0.93) +#> 4: Rate of growth -0.02 (-0.026 -- -0.015) +#> 5: Doubling/halving time (days) -34 (-47 -- -27) +# elapsed time (in seconds) +get_elapsed_time(bkp$fit) +#> warmup sample +#> chain:1 3.244 2.394 +#> chain:2 3.374 2.464 +#> chain:3 2.983 2.374 +#> chain:4 2.885 2.480 +# summary plot +plot(bkp) +``` + +![plot of chunk bp](figure/bp-1.png) + +## Weekly random walk + +Instead of a smooth Gaussian Process we might want the reproduction number to change step-wise, e.g. every week. +This can be achieved using the `rw` option which defines the length of the time step in a random walk that the reproduction number is assumed to follow. + + +```r +rw <- estimate_infections(reported_cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + rt = rt_opts(prior = rt_prior, rw = 7), + gp = NULL +) +# summarise results +summary(rw) +#> measure estimate +#> 1: New confirmed cases by infection date 2157 (1174 -- 4100) +#> 2: Expected change in daily cases Likely decreasing +#> 3: Effective reproduction no. 0.86 (0.65 -- 1.1) +#> 4: Rate of growth -0.031 (-0.088 -- 0.029) +#> 5: Doubling/halving time (days) -23 (24 -- -7.9) +# elapsed time (in seconds) +get_elapsed_time(rw$fit) +#> warmup sample +#> chain:1 6.964 9.797 +#> chain:2 7.062 9.976 +#> chain:3 7.581 10.123 +#> chain:4 5.757 10.048 +# summary plot +plot(rw) +``` + +![plot of chunk weekly_rw](figure/weekly_rw-1.png) + +## No delays + +Whilst _EpiNow2_ allows the user to specify delays, it can also run directly on the data as does e.g. the [EpiEstim](https://CRAN.R-project.org/package=EpiEstim) package. + + +```r +no_delay <- estimate_infections( + reported_cases, + generation_time = generation_time_opts(generation_time) +) +#> Warning: There were 5 divergent transitions after warmup. See +#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup +#> to find out why this is a problem and how to eliminate them. +#> Warning: Examine the pairs() plot to diagnose sampling problems +# summarise results +summary(no_delay) +#> measure estimate +#> 1: New confirmed cases by infection date 2782 (2326 -- 3333) +#> 2: Expected change in daily cases Decreasing +#> 3: Effective reproduction no. 0.88 (0.77 -- 0.99) +#> 4: Rate of growth -0.027 (-0.054 -- -0.0016) +#> 5: Doubling/halving time (days) -25 (-420 -- -13) +# elapsed time (in seconds) +get_elapsed_time(no_delay$fit) +#> warmup sample +#> chain:1 31.511 56.745 +#> chain:2 32.908 34.381 +#> chain:3 30.592 33.226 +#> chain:4 34.068 31.227 +# summary plot +plot(no_delay) +``` + +![plot of chunk no_delays](figure/no_delays-1.png) + +## Non-parametric infection model + +The package also includes a non-parametric infection model. +This runs much faster but does not use the renewal equation to generate infections. +Because of this none of the options defining the behaviour of the reproduction number are available in this case, limiting user choice and model generality. +It also means that the model is questionable for forecasting, which is why were here set the predictive horizon to 0. + + +```r +non_parametric <- estimate_infections(reported_cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + rt = NULL, + backcalc = backcalc_opts(), + horizon = 0 +) +#> Warning in validityMethod(object): The following variables have undefined +#> values: gt_rev_pmf[1],The following variables have undefined values: +#> gt_rev_pmf[2],The following variables have undefined values: gt_rev_pmf[3],The +#> following variables have undefined values: gt_rev_pmf[4],The following +#> variables have undefined values: gt_rev_pmf[5],The following variables have +#> undefined values: gt_rev_pmf[6],The following variables have undefined values: +#> gt_rev_pmf[7],The following variables have undefined values: gt_rev_pmf[8],The +#> following variables have undefined values: gt_rev_pmf[9],The following +#> variables have undefined values: gt_rev_pmf[10],The following variables have +#> undefined values: gt_rev_pmf[11],The following variables have undefined values: +#> gt_rev_pmf[12],The following variables have undefined values: +#> gt_rev_pmf[13],The following variables have undefined values: +#> gt_rev_pmf[14],The following variables have undefined values: gt_rev_pmf[15]. +#> Many subsequent functions will not work correctly. +# summarise results +summary(non_parametric) +#> measure estimate +#> 1: New confirmed cases by infection date 2337 (1732 -- 3088) +#> 2: Expected change in daily cases Decreasing +#> 3: Effective reproduction no. 0.89 (0.78 -- 0.99) +#> 4: Rate of growth -0.023 (-0.049 -- -0.0021) +#> 5: Doubling/halving time (days) -30 (-330 -- -14) +# elapsed time (in seconds) +get_elapsed_time(non_parametric$fit) +#> warmup sample +#> chain:1 2.764 0.849 +#> chain:2 2.758 0.822 +#> chain:3 2.900 0.788 +#> chain:4 3.010 0.856 +# summary plot +plot(non_parametric) +``` + +![plot of chunk nonparametric](figure/nonparametric-1.png) + +# Running the model in production mode + +The package contains functionality to run `estimate_infections` in production mode, i.e. with full logging and saving all relevant outputs to the hard drive. +This is done with the `epinow()` function, that takes the same options as `estimate_infections` with some additional infections that determine, for example, where output gets stored and what output exactly. +The function can be a useful option when, e.g., running the model daily with updated data on a high-perforumance computing server to feed into a dashboard. + +## Running the model simultaneously on multiple regions + +The package also contains functionality to conduct inference contemporaneously (if separately) in production mode on multiple time series, e.g. to run the model on multiple regions. +This is done with the `regional_epinow()` function. + +Say, for example, we construct a data sets containing two regions, `testland` and `realland` (in this simple example both containing the same case data). + + +```r +cases <- example_confirmed[1:60] +cases <- data.table::rbindlist(list( + data.table::copy(cases)[, region := "testland"], + cases[, region := "realland"] + )) +``` + +To then run this on multiple regions using the default options above, we could use + + +```r +region_rt <- regional_epinow( + reported_cases = cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + rt = rt_opts(prior = rt_prior), +) +#> INFO [2023-09-26 19:39:06] Producing following optional outputs: regions, summary, samples, plots, latest +#> Logging threshold set at INFO for the EpiNow2 logger +#> Writing EpiNow2 logs to the console and: /tmp/RtmpaIik9O/regional-epinow/2020-04-21.log +#> Logging threshold set at INFO for the EpiNow2.epinow logger +#> Writing EpiNow2.epinow logs to: /tmp/RtmpaIik9O/epinow/2020-04-21.log +#> INFO [2023-09-26 19:39:06] Reporting estimates using data up to: 2020-04-21 +#> INFO [2023-09-26 19:39:06] No target directory specified so returning output +#> INFO [2023-09-26 19:39:06] Producing estimates for: testland, realland +#> INFO [2023-09-26 19:39:06] Regions excluded: none +#> INFO [2023-09-26 19:40:20] Completed estimates for: testland +#> INFO [2023-09-26 19:41:33] Completed estimates for: realland +#> INFO [2023-09-26 19:41:33] Completed regional estimates +#> INFO [2023-09-26 19:41:33] Regions with estimates: 2 +#> INFO [2023-09-26 19:41:33] Regions with runtime errors: 0 +#> INFO [2023-09-26 19:41:33] Producing summary +#> INFO [2023-09-26 19:41:33] No summary directory specified so returning summary output +#> INFO [2023-09-26 19:41:33] No target directory specified so returning timings +## summary +region_rt$summary$summarised_results$table +#> Region New confirmed cases by infection date +#> 1: realland 2251 (1117 -- 4394) +#> 2: testland 2228 (1130 -- 4254) +#> Expected change in daily cases Effective reproduction no. +#> 1: Likely decreasing 0.88 (0.62 -- 1.2) +#> 2: Likely decreasing 0.87 (0.61 -- 1.2) +#> Rate of growth Doubling/halving time (days) +#> 1: -0.027 (-0.095 -- 0.036) -26 (19 -- -7.3) +#> 2: -0.028 (-0.097 -- 0.034) -25 (21 -- -7.1) +## plot +region_rt$summary$plots$R +``` + +![plot of chunk regional_epinow](figure/regional_epinow-1.png) + +If instead, we wanted to use the Gaussian Process for `testland` and a weekly random walk for `realland` we could specify these separately using the `opts_list()` and `update_list()` functions + + +```r +gp <- opts_list(gp_opts(), cases) +gp <- update_list(gp, list(realland = NULL)) +rt <- opts_list(rt_opts(), cases, realland = rt_opts(rw = 7)) +region_separate_rt <- regional_epinow( + reported_cases = cases, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + rt = rt, gp = gp, +) +#> INFO [2023-09-26 19:41:33] Producing following optional outputs: regions, summary, samples, plots, latest +#> Logging threshold set at INFO for the EpiNow2 logger +#> Writing EpiNow2 logs to the console and: /tmp/RtmpaIik9O/regional-epinow/2020-04-21.log +#> Logging threshold set at INFO for the EpiNow2.epinow logger +#> Writing EpiNow2.epinow logs to: /tmp/RtmpaIik9O/epinow/2020-04-21.log +#> INFO [2023-09-26 19:41:33] Reporting estimates using data up to: 2020-04-21 +#> INFO [2023-09-26 19:41:33] No target directory specified so returning output +#> INFO [2023-09-26 19:41:34] Producing estimates for: testland, realland +#> INFO [2023-09-26 19:41:34] Regions excluded: none +#> INFO [2023-09-26 19:42:47] Completed estimates for: testland +#> INFO [2023-09-26 19:43:11] Completed estimates for: realland +#> INFO [2023-09-26 19:43:11] Completed regional estimates +#> INFO [2023-09-26 19:43:11] Regions with estimates: 2 +#> INFO [2023-09-26 19:43:11] Regions with runtime errors: 0 +#> INFO [2023-09-26 19:43:11] Producing summary +#> INFO [2023-09-26 19:43:11] No summary directory specified so returning summary output +#> INFO [2023-09-26 19:43:11] No target directory specified so returning timings +## summary +region_separate_rt$summary$summarised_results$table +#> Region New confirmed cases by infection date +#> 1: realland 2142 (1138 -- 4205) +#> 2: testland 2212 (999 -- 4399) +#> Expected change in daily cases Effective reproduction no. +#> 1: Likely decreasing 0.86 (0.63 -- 1.2) +#> 2: Likely decreasing 0.87 (0.57 -- 1.2) +#> Rate of growth Doubling/halving time (days) +#> 1: -0.032 (-0.093 -- 0.033) -21 (21 -- -7.5) +#> 2: -0.029 (-0.11 -- 0.037) -24 (19 -- -6.3) +## plot +region_separate_rt$summary$plots$R +``` + +![plot of chunk regional_epinow_multiple](figure/regional_epinow_multiple-1.png) diff --git a/vignettes/estimate_infections_workflow.Rmd b/vignettes/estimate_infections_workflow.Rmd new file mode 100644 index 000000000..76c9f2d62 --- /dev/null +++ b/vignettes/estimate_infections_workflow.Rmd @@ -0,0 +1,237 @@ +--- +title: "Workflow for Rt estimation and forecasting" +output: + rmarkdown::html_vignette: + toc: true + number_sections: true +bibliography: library.bib +csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa-numeric-superscript-brackets.csl +vignette: > + %\VignetteIndexEntry{Workflow for Rt estimation and forecasting} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + + + +In this vignette we describe the typical workflow by which someone might obtain reproduction number estimates and short-term forecasts for a given disease spreading in a given setting. +The vignette uses the default model included in the package. +See other vignettes for a more thorough exploration of [alternative model variants](estimate_infections_options.html) and [theoretical background](estimate_infections.html). + +# Data + +Obtaining a good and full understanding of the data being used an important first step in any inference procedure such as the one applied here. +EpiNow2 expects data in the format of a data frame with two columns, `date` and `confirm`, where `confirm` stands for the number of confirmed counts - although in reality this can be applied to any data including suspected cases and lab-confirmed outcomes. +The user might already have the data as such a time series provided, for example, on public dashboards or directly from public health authorities. +Alternatively, they can be constructed from individual-level data, for example using the [incidence2](https://cran.r-project.org/web/packages/incidence2/index.html) R package. +An example data set called `example_confirm` is included in the package: + + +```r +head(example_confirmed) +#> date confirm +#> 1: 2020-02-22 14 +#> 2: 2020-02-23 62 +#> 3: 2020-02-24 53 +#> 4: 2020-02-25 97 +#> 5: 2020-02-26 93 +#> 6: 2020-02-27 78 +``` + +Any estimation procedure is only as good as the data that feeds into it. +A thorough understanding of the data that is used for EpiNow2 and its limitations is a prerequisite for its use. +This includes but is not limited to biases in the population groups that are represented (EpiNow2 assumes a closed population with all infections being caused by other infections in the same population), reporting artefacts and delays, and completeness of reporting. +Some of these can be mitigated using the routines available in EpiNow2 as described below, but others will cause biases in the results and need to be carefully considered when interpreting the results. + +# Parameters + +Once a data set has been identified, a number of relevant parameters need to be considered before using EpiNow2. +As these will affect any results, it is worth spending some time investigating what their values should be. + +## Delay distributions + +EpiNow2 works with different delays that apply to different parts of the infection and observation process. +They are defined using a common interface with the `dist_spec()` function. +For help with this function, see its manual page + + +```r +?dist_spec +``` + +In all cases, the distributions given can be *fixed* (i.e. have no uncertainty) or *variable* (i.e. have associated uncertainty). +For example, to define a fixed gamma distribution with mean 3, standard deviation (sd) 1 and maximum value 10, you would write + + +```r +dist_spec(mean = 3, sd = 1, distribution = "gamma", max = 10) +#> +#> Fixed distribution with PMF [0.0038 0.15 0.39 0.3 0.12 0.03 0.006 0.00096 0.00013 1.6e-05] +``` + +If distributions are variable, the values with uncertainty are treated as [prior probability densities](https://en.wikipedia.org/wiki/Prior_probability) in the Bayesian inference framework used by EpiNow2, i.e. they are estimated as part of the inference. +For example, to define a variable gamma distribution where uncertainty in the mean is given by a normal distribution with mean 3 and sd 2, and uncertainty in the standard deviation is given by a normal distribution with mean 1 and sd 0.1, with a maximum value 10, you would write + + +```r +dist_spec( + mean = 3, mean_sd = 2, sd = 1, sd_sd = 0.1, distribution = "gamma", max = 10 +) +#> +#> Uncertain gamma distribution with (untruncated) mean 3 (SD 2) and SD 1 (SD 0.1) +``` + +There are various ways the specific delay distributions mentioned below might be obtained. +Often, they will come directly from the existing literature reviewed by the user and studies conducted elsewhere. +Sometimes it might be possible to obatined them from existing databases, e.g. using the [epiparameter](https://github.com/epiverse-trace/epiparameter) R package. +Alternatively they might be obtainable from raw data, e.g. linelists. +The _EpiNow2_ package contains functionality for estimating delay distributions from observed delays in the `estimate_delays()` function. +For a more comprehensive treatment of delays and their estimation avoiding common biases one can consider, for example, the [dynamicaltruncation](https://github.com/parksw3/epidist-paper) R package and associated paper. + +### Generation intervals + +The generation interval is a delay distribution that describes the amount of time that passes between an individual becoming infected and infecting someone else. +In EpiNow2, the generation time distribution is defined by a call to `generation_time_opts()`, a function that takes a single argument defined as a `dist_spec`. +For example, to define the generation time as gamma distributed with uncertain mean centered on 3 (sd: 2) and sd centered on 1 (sd: 0.1), a maximum value of 10 and weighted by the number of case data points we would use + + +```r +generation_time <- dist_spec( + mean = 3, mean_sd = 2, sd = 1, sd_sd = 0.1, distribution = "gamma", max = 10 +) +generation_time_opts(generation_time) +``` + +### Reporting delays + +EpiNow2 calculates reproduction numbers based on the trajectory of infection incidence. +Usually this is not observed directly. +Instead, we calculate case counts based on, for example, onset of symptoms, lab confirmations, hospitalisations, etc. +In order to estimate the trajectory of infection incidence from this we need to either know or estimate the distribution of delays from infection to count. +Often, such counts are composed of multiple delays for which we only have separate information, for example the incubation period (time from infection to symptom onset) and reporting delay (time from symptom onset to being a case in the data, e.g. via lab confirmation, if counts are not by the date of symptom onset). +In this case, we can combine multiple delays defined using `dist_spec()` with the plus (`+`) operator, e.g. + + +```r +incubation_period <- dist_spec( + mean = 1.6, mean_sd = 0.05, sd = 0.5, sd_sd = 0.05, + distribution = "lognormal", max = 14 +) +reporting_delay <- dist_spec( + mean = 0.5, sd = 0.5, distribution = "lognormal", max = 10 +) +incubation_period + reporting_delay +#> +#> Combination of delay distributions: +#> Uncertain lognormal distribution with (untruncated) logmean 1.6 (SD 0.05) and logSD 0.5 (SD 0.05) +#> Fixed distribution with PMF [0.16 0.49 0.23 0.077 0.025 0.0084 0.003 0.0011 0.00045 0.00019] +``` + +In EpiNow2, the reporting delay distribution is defined by a call to `delay_opts()`, a function that takes a single argument defined as a `dist_spec`. +For example, if our observations were by symptom onset we would use + + +```r +delay_opts(incubation_period) +``` + +If they were by date of lab confirmation that happens with a delay given by `reporting_delay`, we would use + + +```r +delay <- incubation_period + reporting_delay +delay_opts(delay) +``` + +### Truncation + +Besides the delay from infection to the event that is recorded in the data, there can also be a delay from that event to being recorded in the data. +For example, data by symptom onset may only enter the data once lab confirmation has occurred, or even a day or two after that confirmation. +Statistically, this means our data is right-truncated. +In practice, it means that recent data will be unlikely to be complete. + +The amount of such truncation that exists in the data can be estimated from multiple snapshots of the data, i.e. what the data looked like at multiple past dates. +One can then use methods that use the amount of backfilling that occurred 1, 2, ... days after data for a date are first reported. +In EpiNow2, this can be done using the `estimate_truncation()` method which returns, amongst others, posterior estimates of the truncation distribution. +For more details on the model used for this, see the [estimate_truncation](estimate_truncation.html) vignette. + + + +```r +?estimate_truncation +``` + +In the `estimate_infections()` function, the truncation distribution is defined by a call to `trunc_opts()`, a function that takes a single argument defined as a `dist_spec` (either defined by the user or obtained from a call to `estimate_truncation` or any other method for estimating right truncation). +This will then be used to correct for right truncation in the data. + +The separation of estimation of right truncation on the one hand and estimation of the reproduction number on the other may be attractive for practical purposes but is questionable statistically as it separates two processes that are not strictly separable, potentially introducing a bias. +For an alternative approach where these are estimated jointly that is being developed by some of the package authors of _EpiNow2_ with collaborators, see the [epinowcast](https://package.epinowcast.org/) package. + +## Completeness of reporting + +Another issue affecting the progression from infections to reported outcomes is underreporting, i.e. the fact that not all infections are reported as cases. +This varies both by pathogen and population (and e.g. the proportion of infections that are asymptomatic) as well as the specific outcome used as data and where it is located on the severity pyramid (e.g. hospitalisations vs. community cases). +In EpiNow2 we can specify the proportion of infections that we expect to be observed (with uncertainty assumed represented by a truncated normal distribution with bounds at 0 and 1) using the `scale` argument to the `obs_opts()` function. +For example, if we think that 40% (with standard deviation 1%) of infections end up in the data as observations we could specify. + + +```r +obs_scale <- list(mean = 0.4, sd = 0.01) +obs_opts(scale = obs_scale) +``` + +## Initial reproduction number + +The default model that `estimate_infections` uses EpiNow2 uses to estimate reproduction numbers specification of a prior probability distribution for the initial reproduction number. +This represents the user's initial belief of the value of the reproduction number, where there is no data yet to inform its value. +By default this is assumed to be represented by a lognormal distribution with mean and standard deviation of 1. +It can be changed using the `rt_opts()` function. +For example, if the users believes that at the very start of the data the reproduction number was 2, with uncertainty in this belief represented by a standard deviation of 1, they would used + + +```r +rt_prior <- list(mean = 2, sd = 1) +rt_opts(prior = rt_prior) +``` + +## Weighing delay priors + +When providing uncertain delay distributions one can end up in a situation where the estimated means are shifted a long way from the given distribution means, and possibly further than is deemed realistic by the user. +In that case, one could specify narrower prior distributions (e.g., smaller `mean_sd`) in order to keep the estimated means closer to the given mean, but this can be difficult to do in a principled manner in practice. +As a more straightforward alternative, one can choose to weigh the given priors by the number of data points in the case data set by setting `weight_delay_priors`, i.e. + +# Inference + +All the options are combined in a call to the `estimate_infections()` function. +For example, using some of the options described above one could call + + +```r +def <- estimate_infections( + example_confirmed, + generation_time = generation_time_opts(generation_time), + delays = delay_opts(delay), + rt = rt_opts(prior = rt_prior) +) +#> Warning: There were 7 divergent transitions after warmup. See +#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup +#> to find out why this is a problem and how to eliminate them. +#> Warning: Examine the pairs() plot to diagnose sampling problems +``` + +Alternatively, for production environments the `epinow` function can be used that uses `estimate_infections` internally but adds functionality for logging and saving results and plots in dedicated places in the user's file system. + +# Interpretation + +To visualise the results one can use the `plot` function that comes with the package + + +```r +plot(def) +`` + +The results returned by the `estimate_infections` model depend on the values assigned to all to parameters discussed in this vignette, i.e. delays, scaling, and reproduction numbers, as well as the model variant used and its parameters. +Any interpretation of the results will therefore need to bear these in mind, as well as any properties of the data and/or the subpopulations that it represents. +See the [Model options](estimate_infections_options.html) vignette for an illustration of the impact of model choice. +#> Error: attempt to use zero-length variable name +``` diff --git a/vignettes/figure/bp-1.png b/vignettes/figure/bp-1.png index bbc1d517c..b1b2bc587 100644 Binary files a/vignettes/figure/bp-1.png and b/vignettes/figure/bp-1.png differ diff --git a/vignettes/figure/default-1.png b/vignettes/figure/default-1.png index c9d5f9940..c867e2ebc 100644 Binary files a/vignettes/figure/default-1.png and b/vignettes/figure/default-1.png differ diff --git a/vignettes/figure/fixed-1.png b/vignettes/figure/fixed-1.png index 854d67a8e..d62f13c50 100644 Binary files a/vignettes/figure/fixed-1.png and b/vignettes/figure/fixed-1.png differ diff --git a/vignettes/figure/gp_projection-1.png b/vignettes/figure/gp_projection-1.png index c92fc18b0..74200873a 100644 Binary files a/vignettes/figure/gp_projection-1.png and b/vignettes/figure/gp_projection-1.png differ diff --git a/vignettes/figure/lower_accuracy-1.png b/vignettes/figure/lower_accuracy-1.png index 54f1a548e..ad9670269 100644 Binary files a/vignettes/figure/lower_accuracy-1.png and b/vignettes/figure/lower_accuracy-1.png differ diff --git a/vignettes/figure/no_delays-1.png b/vignettes/figure/no_delays-1.png index 0d16e7455..f918aa93c 100644 Binary files a/vignettes/figure/no_delays-1.png and b/vignettes/figure/no_delays-1.png differ diff --git a/vignettes/figure/nonparametric-1.png b/vignettes/figure/nonparametric-1.png index 5c43f6ec1..bbe7b5dd0 100644 Binary files a/vignettes/figure/nonparametric-1.png and b/vignettes/figure/nonparametric-1.png differ diff --git a/vignettes/figure/regional_epinow-1.png b/vignettes/figure/regional_epinow-1.png index c6d7524a1..7fae41e8b 100644 Binary files a/vignettes/figure/regional_epinow-1.png and b/vignettes/figure/regional_epinow-1.png differ diff --git a/vignettes/figure/regional_epinow_multiple-1.png b/vignettes/figure/regional_epinow_multiple-1.png index 3e74cebf1..0c5503642 100644 Binary files a/vignettes/figure/regional_epinow_multiple-1.png and b/vignettes/figure/regional_epinow_multiple-1.png differ diff --git a/vignettes/figure/susceptible_depletion-1.png b/vignettes/figure/susceptible_depletion-1.png index 37fa05d85..402966e26 100644 Binary files a/vignettes/figure/susceptible_depletion-1.png and b/vignettes/figure/susceptible_depletion-1.png differ diff --git a/vignettes/figure/truncation-1.png b/vignettes/figure/truncation-1.png index a89a15ecb..a994d2c52 100644 Binary files a/vignettes/figure/truncation-1.png and b/vignettes/figure/truncation-1.png differ diff --git a/vignettes/figure/weekly_rw-1.png b/vignettes/figure/weekly_rw-1.png index bca6c1534..a7bdd08fa 100644 Binary files a/vignettes/figure/weekly_rw-1.png and b/vignettes/figure/weekly_rw-1.png differ