diff --git a/vignettes/estimate_infections_workflow-plot_combined_delay-1.png b/vignettes/estimate_infections_workflow-plot_combined_delay-1.png new file mode 100644 index 000000000..f98d4c881 Binary files /dev/null and b/vignettes/estimate_infections_workflow-plot_combined_delay-1.png differ diff --git a/vignettes/estimate_infections_workflow-plot_fixed_gamma-1.png b/vignettes/estimate_infections_workflow-plot_fixed_gamma-1.png new file mode 100644 index 000000000..50d71fde8 Binary files /dev/null and b/vignettes/estimate_infections_workflow-plot_fixed_gamma-1.png differ diff --git a/vignettes/estimate_infections_workflow-plot_uncertain_gamma-1.png b/vignettes/estimate_infections_workflow-plot_uncertain_gamma-1.png new file mode 100644 index 000000000..efa16ba38 Binary files /dev/null and b/vignettes/estimate_infections_workflow-plot_uncertain_gamma-1.png differ diff --git a/vignettes/estimate_infections_workflow-results-1.png b/vignettes/estimate_infections_workflow-results-1.png index 0e8fe0839..e47d0bd6a 100644 Binary files a/vignettes/estimate_infections_workflow-results-1.png and b/vignettes/estimate_infections_workflow-results-1.png differ diff --git a/vignettes/estimate_infections_workflow.Rmd b/vignettes/estimate_infections_workflow.Rmd index 5d6d3c2c1..3ace1dd40 100644 --- a/vignettes/estimate_infections_workflow.Rmd +++ b/vignettes/estimate_infections_workflow.Rmd @@ -24,11 +24,11 @@ Obtaining a good and full understanding of the data being used is an important f _EpiNow2_ expects data in the format of a data frame with two columns, `date` and `confirm`, where `confirm` stands for the number of confirmed counts - although in reality this can be applied to any data including suspected cases and lab-confirmed outcomes. The user might already have the data as such a time series provided, for example, on public dashboards or directly from public health authorities. Alternatively, they can be constructed from individual-level data, for example using the [incidence2](https://CRAN.R-project.org/package=incidence2) R package. -An example data set called `example_confirm` is included in the package: +An example data set called `example_confirmed` is included in the package: -```r -head(example_confirmed) +``` r +head(EpiNow2::example_confirmed) #> date confirm #> #> 1: 2020-02-22 14 @@ -49,14 +49,19 @@ Some of these can be mitigated using the routines available in _EpiNow2_ as desc We first load the _EpiNow2_ package. -```r +``` r library("EpiNow2") +#> +#> Attaching package: 'EpiNow2' +#> The following object is masked from 'package:stats': +#> +#> Gamma ``` We then set the number of cores to use. We will want to run 4 MCMC chains in parallel so we set this to 4. -```r +``` r options(mc.cores = 4) ``` @@ -74,7 +79,7 @@ They are defined using a common interface that involves functions that are named For help with this function, see its manual page -```r +``` r ?EpiNow2::Distributions ``` @@ -82,8 +87,9 @@ In all cases, the distributions given can be *fixed* (i.e. have no uncertainty) For example, to define a fixed gamma distribution with mean 3, standard deviation (sd) 1 and maximum value 10, you would write -```r -Gamma(mean = 3, sd = 1, max = 10) +``` r +fixed_gamma <- Gamma(mean = 3, sd = 1, max = 10) +fixed_gamma #> - gamma distribution (max: 10): #> shape: #> 9 @@ -91,17 +97,27 @@ Gamma(mean = 3, sd = 1, max = 10) #> 3 ``` +which looks like this when plotted + + +``` r +plot(fixed_gamma) +``` + +![plot of chunk plot_fixed_gamma](estimate_infections_workflow-plot_fixed_gamma-1.png) + If distributions are variable, the values with uncertainty are treated as [prior probability densities](https://en.wikipedia.org/wiki/Prior_probability) in the Bayesian inference framework used by _EpiNow2_, i.e. they are estimated as part of the inference. For example, to define a variable gamma distribution where uncertainty in the mean is given by a normal distribution with mean 3 and sd 2, and uncertainty in the standard deviation is given by a normal distribution with mean 1 and sd 0.1, with a maximum value 10, you would write -```r -Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10) -#> Warning in new_dist_spec(params, "gamma"): Uncertain gamma distribution -#> specified in terms of parameters that are not the "natural" parameters of the -#> distribution (shape, rate). Converting using a crude and very approximate -#> method that is likely to produce biased results. If possible, it is preferable -#> to specify the distribution directly in terms of the natural parameters. +``` r +uncertain_gamma <- Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10) +#> Warning: ! Uncertain gamma distribution specified in terms of parameters that are not the "natural" +#> parameters of the distribution shape and rate. +#> ℹ Converting using a crude and very approximate method that is likely to produce biased results. +#> ℹ If possible it is preferable to specify the distribution directly in terms of the natural +#> parameters. +uncertain_gamma #> - gamma distribution (max: 10): #> shape: #> - normal distribution: @@ -117,6 +133,15 @@ Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10) #> 1.4 ``` +which looks like this when plotted + + +``` r +plot(uncertain_gamma) +``` + +![plot of chunk plot_uncertain_gamma](estimate_infections_workflow-plot_uncertain_gamma-1.png) + Note the warning about parameters. We used the mean and standard deviation to define this distribution with uncertain parameters, but it would be better to use the "natural" parameters of the gamma distribution, shape and rate, for example using the values estimate and reported back after calling the previous command. @@ -134,7 +159,7 @@ In _EpiNow2_, the generation time distribution is defined by a call to `gt_opts( For example, to define the generation time as gamma distributed with uncertain mean centered on 3 and sd centered on 1 with some uncertainty, a maximum value of 10 and weighted by the number of case data points we could use the shape and rate parameters suggested above (though notes that this will only very approximately produce the uncertainty in mean and standard deviation stated there): -```r +``` r generation_time <- Gamma( shape = Normal(9, 2.5), rate = Normal(3, 1.4), max = 10 ) @@ -151,14 +176,15 @@ Often, such counts are composed of multiple delays for which we only have separa In this case, we can combine multiple delays with the plus (`+`) operator, e.g. -```r +``` r incubation_period <- LogNormal( meanlog = Normal(1.6, 0.05), sdlog = Normal(0.5, 0.05), max = 14 ) reporting_delay <- LogNormal(meanlog = 0.5, sdlog = 0.5, max = 10) -incubation_period + reporting_delay +combined_delays <- incubation_period + reporting_delay +combined_delays #> Composite distribution: #> - lognormal distribution (max: 14): #> meanlog: @@ -180,18 +206,27 @@ incubation_period + reporting_delay #> 0.5 ``` +We can visualise this combined delay + + +``` r +plot(combined_delays) +``` + +![plot of chunk plot_combined_delay](estimate_infections_workflow-plot_combined_delay-1.png) + In _EpiNow2_, the reporting delay distribution is defined by a call to `delay_opts()`, a function that takes a single argument defined as a `dist_spec` object (returned by `LogNormal()`, `Gamma()` etc.). For example, if our observations were by symptom onset we would use -```r +``` r delay_opts(incubation_period) ``` If they were by date of lab confirmation that happens with a delay given by `reporting_delay`, we would use -```r +``` r delay <- incubation_period + reporting_delay delay_opts(delay) ``` @@ -209,7 +244,7 @@ In _EpiNow2_, this can be done using the `estimate_truncation()` method which re For more details on the model used for this, see the [estimate_truncation](estimate_truncation.html) vignette. -```r +``` r ?estimate_truncation ``` @@ -227,7 +262,7 @@ In _EpiNow2_ we can specify the proportion of infections that we expect to be ob For example, if we think that 40% (with standard deviation 1%) of infections end up in the data as observations we could specify. -```r +``` r obs_scale <- list(mean = 0.4, sd = 0.01) obs_opts(scale = obs_scale) ``` @@ -241,7 +276,7 @@ It can be changed using the `rt_opts()` function. For example, if the user believes that at the very start of the data the reproduction number was 2, with uncertainty in this belief represented by a standard deviation of 1, they would use -```r +``` r rt_prior <- list(mean = 2, sd = 1) rt_opts(prior = rt_prior) ``` @@ -258,14 +293,15 @@ All the options are combined in a call to the `estimate_infections()` function. For example, using some of the options described above one could call -```r +``` r def <- estimate_infections( example_confirmed, generation_time = gt_opts(generation_time), delays = delay_opts(delay), rt = rt_opts(prior = rt_prior) ) -#> Warning: There were 3 divergent transitions after warmup. See +#> DEBUG [2024-09-26 20:23:03] estimate_infections: Running in exact mode for 2000 samples (across 4 chains each with a warm up of 250 iterations each) and 147 time steps of which 7 are a forecast +#> Warning: There were 6 divergent transitions after warmup. See #> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup #> to find out why this is a problem and how to eliminate them. #> Warning: Examine the pairs() plot to diagnose sampling problems @@ -284,7 +320,7 @@ The package contains functionality to estimate the delay and scaling between mul To visualise the results one can use the `plot()` function that comes with the package -```r +``` r plot(def) ``` diff --git a/vignettes/estimate_infections_workflow.Rmd.orig b/vignettes/estimate_infections_workflow.Rmd.orig index 1d2da81d7..ca1297472 100644 --- a/vignettes/estimate_infections_workflow.Rmd.orig +++ b/vignettes/estimate_infections_workflow.Rmd.orig @@ -33,10 +33,10 @@ Obtaining a good and full understanding of the data being used is an important f _EpiNow2_ expects data in the format of a data frame with two columns, `date` and `confirm`, where `confirm` stands for the number of confirmed counts - although in reality this can be applied to any data including suspected cases and lab-confirmed outcomes. The user might already have the data as such a time series provided, for example, on public dashboards or directly from public health authorities. Alternatively, they can be constructed from individual-level data, for example using the [incidence2](https://CRAN.R-project.org/package=incidence2) R package. -An example data set called `example_confirm` is included in the package: +An example data set called `example_confirmed` is included in the package: ```{r} -head(example_confirmed) +head(EpiNow2::example_confirmed) ``` Any estimation procedure is only as good as the data that feeds into it. @@ -79,14 +79,28 @@ In all cases, the distributions given can be *fixed* (i.e. have no uncertainty) For example, to define a fixed gamma distribution with mean 3, standard deviation (sd) 1 and maximum value 10, you would write ```{r} -Gamma(mean = 3, sd = 1, max = 10) +fixed_gamma <- Gamma(mean = 3, sd = 1, max = 10) +fixed_gamma +``` + +which looks like this when plotted + +```{r plot_fixed_gamma} +plot(fixed_gamma) ``` If distributions are variable, the values with uncertainty are treated as [prior probability densities](https://en.wikipedia.org/wiki/Prior_probability) in the Bayesian inference framework used by _EpiNow2_, i.e. they are estimated as part of the inference. For example, to define a variable gamma distribution where uncertainty in the mean is given by a normal distribution with mean 3 and sd 2, and uncertainty in the standard deviation is given by a normal distribution with mean 1 and sd 0.1, with a maximum value 10, you would write ```{r} -Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10) +uncertain_gamma <- Gamma(mean = Normal(3, 2), sd = Normal(1, 0.1), max = 10) +uncertain_gamma +``` + +which looks like this when plotted + +```{r plot_uncertain_gamma} +plot(uncertain_gamma) ``` Note the warning about parameters. @@ -128,7 +142,14 @@ incubation_period <- LogNormal( max = 14 ) reporting_delay <- LogNormal(meanlog = 0.5, sdlog = 0.5, max = 10) -incubation_period + reporting_delay +combined_delays <- incubation_period + reporting_delay +combined_delays +``` + +We can visualise this combined delay + +```{r plot_combined_delay} +plot(combined_delays) ``` In _EpiNow2_, the reporting delay distribution is defined by a call to `delay_opts()`, a function that takes a single argument defined as a `dist_spec` object (returned by `LogNormal()`, `Gamma()` etc.).