Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated interface for accumulation #851

Merged
merged 40 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
d3ed73a
add new fill_missing function
sbfnk Nov 10, 2024
9cc76e1
add new accumulate variable
sbfnk Nov 10, 2024
f01834f
update stan model
sbfnk Nov 10, 2024
f95f262
update checks
sbfnk Nov 10, 2024
751798a
expand docs
sbfnk Nov 10, 2024
fb2975f
create accumulate column when buffeting
sbfnk Nov 10, 2024
91e66fa
use helper function for deprecation
sbfnk Nov 10, 2024
be9642f
render docs
sbfnk Nov 10, 2024
071f7d2
update tests
sbfnk Nov 10, 2024
daaa8d8
add news item
sbfnk Nov 10, 2024
7ad96f8
linting
sbfnk Nov 11, 2024
8580dcc
fix arg name
sbfnk Nov 11, 2024
450fa78
fix bugs
sbfnk Nov 11, 2024
abbc0c6
do all reports
sbfnk Nov 11, 2024
6d45055
update test of deprecated functionality
sbfnk Nov 11, 2024
3288733
export function
sbfnk Nov 11, 2024
5b29e57
fix example
sbfnk Nov 11, 2024
1f73081
add global var
sbfnk Nov 11, 2024
0b58c2c
add missing man page
sbfnk Nov 11, 2024
87f5c0f
fix issues identified in checks
sbfnk Nov 11, 2024
b7ccaeb
fix recommended workflow
sbfnk Nov 11, 2024
08814c4
remove obsolete arguments
sbfnk Nov 11, 2024
fa16009
Apply suggestions from code review
sbfnk Nov 12, 2024
6a0d9b2
move accumulation to after truncation
sbfnk Nov 12, 2024
5cb4ee9
lint
sbfnk Nov 12, 2024
ce2bd82
rename arguments
sbfnk Nov 21, 2024
f130264
make "confirm" the default for `obs_column`
sbfnk Nov 21, 2024
9d2f63d
add "by" arugment to `fill_missing()`
sbfnk Nov 21, 2024
fb92547
clarify `initial_accumulate` argument
sbfnk Nov 21, 2024
7d43fd6
add assertions suggested by @jamesmbaazam
sbfnk Nov 22, 2024
3e192d1
Apply suggestions from code review
sbfnk Nov 22, 2024
3c9843b
Merge branch 'main' into accumulate
sbfnk Nov 22, 2024
7b25dc5
update example
sbfnk Nov 22, 2024
fd020b6
only assert if not missing
sbfnk Nov 22, 2024
5dc48de
set column order
sbfnk Nov 22, 2024
ceb13e7
update tests
sbfnk Nov 22, 2024
9a8ef8e
render design doc from Rmd to check correctness
sbfnk Nov 22, 2024
abb8d1c
accumulate after truncation
sbfnk Nov 23, 2024
d25f0d1
add more documentation
sbfnk Nov 23, 2024
fc91c49
clarify truncation happens before accumulation
sbfnk Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ export(extract_CrIs)
export(extract_inits)
export(extract_samples)
export(extract_stan_param)
export(fill_missing)
export(fix_dist)
export(fix_parameters)
export(forecast_infections)
Expand Down Expand Up @@ -137,6 +138,7 @@ importFrom(cli,col_blue)
importFrom(cli,col_red)
importFrom(data.table,":=")
importFrom(data.table,.N)
importFrom(data.table,CJ)
importFrom(data.table,as.data.table)
importFrom(data.table,copy)
importFrom(data.table,data.table)
Expand Down
23 changes: 18 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,26 @@
# EpiNow2 (development version)

## Documentation
## Model changes

- Brought the docs on `alpha_sd` up to date with the code change from prior PR #853. By @zsusswein in #862 and reviewed by @jamesmbaazam.
- The models now support more complex patterns of aggregating reported cases by accumulating them over multiple time points, as well as mixtures of accumulation and missingness via the new `fill_missing()` function and a logical `accumulate` column that can be supplied with the data. By @sbfnk in #851 and reviewed by @seabbs and @jamesmbaazam..

## Model changes
```r
# Deprecated
data |>
estimate_infections(obs_opts(na = "accumulate"))

- A bug was fixed where the initial growth was never estimated (i.e. the prior mean was always zero). By @sbfnk in #853 and reviewed by @seabbs.
- A bug was fixed where an internal function for applying a default cdf cutoff failed due to a difference a vector length issue. By @jamesmbaazam in #858 and reviewed by @sbfnk.
# Recommended workflow e.g. for weekly incidence data
sbfnk marked this conversation as resolved.
Show resolved Hide resolved
data |>
fill_missing(missing = "accumulate", initial_accumulate = 7) |>
estimate_infections()
```

- A bug was fixed where the initial growth was never estimated (i.e. the prior mean was always zero). By @sbfnk in #853 and reviewed by @seabbs.
- A bug was fixed where an internal function for applying a default cdf cutoff failed due to a difference a vector length issue. By @jamesmbaazam in #858 and reviewed by @sbfnk.

## Documentation

- Brought the docs on `alpha_sd` up to date with the code change from prior PR #853. By @zsusswein in #862 and reviewed by @jamesmbaazam.

# EpiNow2 1.6.1

Expand Down
13 changes: 6 additions & 7 deletions R/checks.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,29 +15,28 @@
#' "estimate_infections", "estimate_truncation", or "estimate_secondary".
#' This is used to determine which checks to perform on the data input.
#' @importFrom checkmate assert_data_frame assert_date assert_names
#' assert_numeric
#' assert_numeric
#' @importFrom purrr walk
#' @importFrom rlang arg_match
#' @return Called for its side effects.
#' @keywords internal
check_reports_valid <- function(data,
model = c(
"estimate_infections",
"estimate_truncation",
"estimate_secondary"
)) {
# Check that the case time series (reports) is a data frame
assert_data_frame(data)
# Perform checks depending on the model to the data is meant to be used with
model <- arg_match(model)

assert_date(data$date, any.missing = FALSE)
if (model == "estimate_secondary") {
# Check that data has the right column names
assert_names(
names(data),
must.include = c("date", "primary", "secondary")
)
# Check that the data data.frame has the right column types
assert_date(data$date, any.missing = FALSE)
assert_numeric(data$primary, lower = 0)
assert_numeric(data$secondary, lower = 0)
} else {
Expand All @@ -46,10 +45,10 @@ check_reports_valid <- function(data,
names(data),
must.include = c("date", "confirm")
)
# Check that the data data.frame has the right column types
assert_date(data$date, any.missing = FALSE)
assert_numeric(data$confirm, lower = 0)
}
assert_logical(data$accumulate, null.ok = TRUE)
return(invisible(data))
}

#' Validate probability distribution for passing to stan
Expand Down Expand Up @@ -224,7 +223,7 @@ test_data_complete <- function(data, cols_to_check) {

#' Cross-check treatment of `NA` in obs_opts() against input data
#'
#' @description `r lifecycle::badge("experimental")`
#' @description `r lifecycle::badge("deprecated")`
#'
#' This function checks the input data for implicit and/or explicit missingness
#' and checks if the user specified `na = "missing"` in [obs_opts()].
Expand Down
9 changes: 7 additions & 2 deletions R/create.R
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ create_clean_reported_cases <- function(data, horizon = 0,
}
reported_cases[is.na(confirm), confirm := fill]
reported_cases[, "average_7_day" := NULL]
## set accumulate to FALSE in added rows
if ("accumulate" %in% colnames(reported_cases)) {
reported_cases[is.na(accumulate), accumulate := FALSE]
}
return(reported_cases)
}

Expand Down Expand Up @@ -484,7 +488,6 @@ create_obs_model <- function(obs = obs_opts(), dates) {
obs_scale = as.integer(obs$scale$sd > 0 || obs$scale$mean != 1),
obs_scale_mean = obs$scale$mean,
obs_scale_sd = obs$scale$sd,
accumulate = obs$accumulate,
likelihood = as.numeric(obs$likelihood),
return_likelihood = as.numeric(obs$return_likelihood)
)
Expand Down Expand Up @@ -525,14 +528,16 @@ create_obs_model <- function(obs = obs_opts(), dates) {
create_stan_data <- function(data, seeding_time,
rt, gp, obs, horizon,
backcalc, shifted_cases) {

cases <- data[(seeding_time + 1):(.N - horizon)]
complete_cases <- create_complete_cases(cases)
cases <- cases$confirm
accumulate <- data[-(1:seeding_time)]$accumulate

stan_data <- list(
cases = complete_cases$confirm,
cases_time = complete_cases$lookup,
any_accumulate = as.integer(any(accumulate)),
accumulate = as.integer(accumulate),
lt = nrow(complete_cases),
shifted_cases = shifted_cases,
t = length(data$date),
Expand Down
18 changes: 12 additions & 6 deletions R/estimate_infections.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,13 @@
#' for an example of using `estimate_infections` within the `epinow` wrapper to
#' estimate Rt for Covid-19 in a country from the ECDC data source.
#'
#' @param data A `<data.frame>` of confirmed cases (confirm) by date
#' (date). `confirm` must be numeric and `date` must be in date format.
#' @param data A `<data.frame>` of confirmed cases (confirm) by date (date).
#' `confirm` must be numeric and `date` must be in date format. Optionally
#' this can also have a logical `accumulate` column which indicates whether
#' data should be added to the next data point. This is useful when modelling
sbfnk marked this conversation as resolved.
Show resolved Hide resolved
#' e.g. weekly incidence data. See also the [fill_missing()] function which
#' helps add the `accumulate` column with the desired properties when dealing
#' with non-daily data.
#'
#' @param reported_cases Deprecated; use `data` instead.
#'
Expand Down Expand Up @@ -176,10 +181,11 @@ estimate_infections <- function(data,
data = dirty_reported_cases,
cols_to_check = c("date", "confirm")
)
# Fill missing dates
reported_cases <- default_fill_missing_obs(data, obs, "confirm")
# Create clean and complete cases
# Order cases
reported_cases <- create_clean_reported_cases(
data, horizon,
reported_cases, horizon,
filter_leading_zeros = filter_leading_zeros,
zero_threshold = zero_threshold
)
Expand All @@ -197,9 +203,9 @@ estimate_infections <- function(data,
min(reported_cases$date) - 1,
by = "days"
),
confirm = 0, breakpoint = 0
confirm = 0, accumulate = FALSE, breakpoint = 0
),
reported_cases
reported_cases[, .(date, confirm, accumulate, breakpoint)]
))

shifted_cases <- create_shifted_cases(
Expand Down
9 changes: 7 additions & 2 deletions R/estimate_secondary.R
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,10 @@ estimate_secondary <- function(data,
data = reports,
cols_to_check = c("date", "primary", "secondary")
)
secondary_reports_dirty <- reports[, list(date, confirm = secondary)]
reports <- default_fill_missing_obs(reports, obs, "secondary")

secondary_reports_dirty <-
reports[, list(date, confirm = secondary, accumulate)]
secondary_reports <- create_clean_reported_cases(
secondary_reports_dirty,
filter_leading_zeros = filter_leading_zeros,
Expand Down Expand Up @@ -225,7 +228,9 @@ estimate_secondary <- function(data,
obs_time = complete_secondary[lookup > burn_in]$lookup - burn_in,
lt = sum(complete_secondary$lookup > burn_in),
burn_in = burn_in,
seeding_time = 0
seeding_time = 0,
any_accumulate = as.integer(any(reports$accumulate > 0)),
accumulate = as.integer(reports$accumulate)
)
# secondary model options
stan_data <- c(stan_data, secondary)
Expand Down
2 changes: 1 addition & 1 deletion R/estimate_truncation.R
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ estimate_truncation <- function(data,
)
}
# Validate inputs
walk(data, check_reports_valid, model = "estimate_truncation")
walk(data, check_reports_valid, model = "estimate_infections")
assert_class(truncation, "dist_spec")
assert_class(model, "stanfit", null.ok = TRUE)
assert_numeric(CrIs, lower = 0, upper = 1)
Expand Down
29 changes: 18 additions & 11 deletions R/opts.R
Original file line number Diff line number Diff line change
Expand Up @@ -594,15 +594,7 @@ gp_opts <- function(basis_prop = 0.2,
#' scale) or a list with numeric elements mean (`mean`) and standard deviation
#' (`sd`) defining a normally distributed scaling factor. Defaults to 1, i.e.
#' no scaling.
#' @param na Character. Options are "missing" (the default) and "accumulate".
#' This determines how NA values in the data are interpreted. If set to
#' "missing", any NA values in the observation data set will be interpreted as
#' missing and skipped in the likelihood. If set to "accumulate", modelled
#' observations will be accumulated and added to the next non-NA data point.
#' This can be used to model incidence data that is reported at less than
#' daily intervals. If set to "accumulate", the first data point is not
#' included in the likelihood but used only to reset modelled observations to
#' zero.
#' @param na Deprecated; use the [fill_missing()] function instead
#' @param likelihood Logical, defaults to `TRUE`. Should the likelihood be
#' included in the model.
#' @param return_likelihood Logical, defaults to `FALSE`. Should the likelihood
Expand Down Expand Up @@ -631,6 +623,21 @@ obs_opts <- function(family = c("negbin", "poisson"),
return_likelihood = FALSE) {
# NB: This has to be checked first before the na argument is touched anywhere.
na_default_used <- missing(na)
if (!na_default_used) {
lifecycle::deprecate_warn(
"1.7.0",
"obs_opts(na)",
"fill_missing()",
details = c(
paste0(
"If NA values are not to be treated as missing use the ",
"`fill_missing()` function instead."
),
"This argument will be removed in the next release of EpiNow2."
)
)

}
na <- arg_match(na)
if (na == "accumulate") {
#nolint start: duplicate_argument_linter
Expand All @@ -641,8 +648,8 @@ obs_opts <- function(family = c("negbin", "poisson"),
"i" = "This means that the first data point is not included in the
likelihood but used only to reset modelled observations to zero.",
"i" = "{col_red('If the first data point should be included in the
likelihood this can be achieved by adding a data point of arbitrary
value before the first data point.')}"
likelihood this can be achieved by using the `fill_missing()` function
with a non-zero `initial_missing` argument.')}"
),
.frequency = "regularly",
.frequency_id = "obs_opts"
Expand Down
Loading
Loading