-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #25 from thomaszwagerman/timelines
Adding in continuity checking functions
- Loading branch information
Showing
21 changed files
with
679 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ Imports: | |
cli, | ||
dplyr, | ||
lifecycle, | ||
rlang, | ||
waldo | ||
Suggests: | ||
knitr, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
#' timeline: check if a timeseries is continuous | ||
#' | ||
#' Check if a timeseries is continuous. Even if a timeseries does not contain | ||
#' obvious gaps, this does not automatically mean it is also continuous. | ||
#' | ||
#' Measuring instruments can have different behaviours when they fail. For | ||
#' example, during power failure an internal clock could reset to "1970-01-01", | ||
#' or the manufacturing date (say, "2021-01-01"). This leads to unpredictable | ||
#' ways of checking if a dataset is continuous. | ||
#' | ||
#' The `timeline_group()` and `timeline()` functions attempt to give the user | ||
#' control over how to check for continuity by providing an `expected_lag`. The | ||
#' difference between timesteps in a dataset should not exceed the | ||
#' `expected_lag`. | ||
#' | ||
#' @inheritParams timeline_group | ||
#' | ||
#' @seealso [timeline_group()] | ||
#' | ||
#' @returns A boolean, TRUE if the timeseries is continuous, and FALSE if there | ||
#' are more than one continuous timeseries within the dataset. | ||
#' | ||
#' @examples | ||
#' # A nice continuous dataset should return TRUE | ||
#' butterfly::timeline( | ||
#' forestprecipitation$january, | ||
#' datetime_variable = "time", | ||
#' expected_lag = 1 | ||
#' ) | ||
#' | ||
#' # In February, our imaginary rain gauge's onboard computer had a failure. | ||
#' # The timestamp was reset to 1970-01-01 | ||
#' butterfly::timeline( | ||
#' forestprecipitation$february, | ||
#' datetime_variable = "time", | ||
#' expected_lag = 1 | ||
#' ) | ||
#' | ||
#' @export | ||
timeline <- function( | ||
df_current, | ||
datetime_variable, | ||
expected_lag = 1 | ||
) { | ||
|
||
df_timelines <- timeline_group( | ||
df_current, | ||
datetime_variable, | ||
expected_lag | ||
) | ||
|
||
if (length(unique(df_timelines$timeline_group)) == 1) { | ||
is_continuous <- TRUE | ||
|
||
cli::cat_bullet( | ||
"There are no time lags which are greater than the expected lag: ", | ||
deparse(substitute(expected_lag)), | ||
" ", | ||
units(df_timelines$timelag), | ||
". By this measure, the timeseries is continuous.", | ||
bullet = "tick", | ||
col = "green", | ||
bullet_col = "green" | ||
) | ||
|
||
} else if (length(unique(df_timelines$timeline_group)) > 1 ) { | ||
is_continuous <- FALSE | ||
|
||
cli::cat_bullet( | ||
"There are time lags which are greater than the expected lag: ", | ||
deparse(substitute(expected_lag)), | ||
" ", | ||
units(df_timelines$timelag), | ||
". This indicates the timeseries is not continuous. There are ", | ||
length(unique(df_timelines$timeline_group)), | ||
" distinct continuous sequences. Use `timeline_group()` to extract.", | ||
bullet = "info", | ||
col = "orange", | ||
bullet_col = "orange" | ||
) | ||
} | ||
|
||
return(is_continuous) | ||
} | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
#' timeline_group: check if a timeseries is continuous | ||
#' | ||
#' If after using `timeline()` you have established a timeseries is not | ||
#' continuous, or if you are working with data where you expect distinct | ||
#' sequences or events, you can use `timeline_group()` to extract and | ||
#' classify different distinct continuous chunks of your data. | ||
#' | ||
#' We attempt to do this without sorting, or changing the data for a couple | ||
#' of reasons: | ||
#' | ||
#' 1. There are no difference in dates: | ||
#' Some instruments might record dates that appear identical, | ||
#' but are still in chronological order. For example, high-frequency data | ||
#' in fractional seconds. This is a rare use case though. | ||
#' | ||
#' 2. Dates are generally ascending/descending, but the instrument has | ||
#' returned to origin. Probably more common, and will results in a | ||
#' non-continuous dataset, however the records are still in chronological order | ||
#' This is something we would like to discover. This is accounted for in the | ||
#' logic in case_when(). | ||
#' | ||
#' @param df_current data.frame, the newest/current version of dataset x. | ||
#' @param datetime_variable string, the "datetime" variable that should be | ||
#' checked for continuity. | ||
#' @param expected_lag numeric, the acceptable difference between timestep for | ||
#' a timeseries to be classed as continuous. Any difference greater than | ||
#' `expected_lag` will indicate a timeseries is not continuous. Default is 1. | ||
#' The smallest units of measurement present in the column will be used. In a | ||
#' column formatted YYYY-MM-DD day will be used. | ||
#' | ||
#' @returns A data.frame, identical to `df_current`, but with extra columns | ||
#' `timeline_group`, which assigns a number to each continuous sets of | ||
#' data and `timelag` which specifies the time lags between rows. | ||
#' | ||
#' @examples | ||
#' # A nice continuous dataset should return TRUE | ||
#' # In February, our imaginary rain gauge's onboard computer had a failure. | ||
#' # The timestamp was reset to 1970-01-01 | ||
#' | ||
#' # We want to group these different distinct continuous sequences: | ||
#' butterfly::timeline_group( | ||
#' forestprecipitation$february, | ||
#' datetime_variable = "time", | ||
#' expected_lag = 1 | ||
#' ) | ||
#' | ||
#' @importFrom rlang .data | ||
#' | ||
#' @export | ||
timeline_group <- function( | ||
df_current, | ||
datetime_variable, | ||
expected_lag = 1 | ||
) { | ||
stopifnot("`df_current` must be a data.frame" = is.data.frame(df_current)) | ||
stopifnot("`expected_lag` must be numeric" = is.numeric(expected_lag)) | ||
|
||
# Check if `datetime_variable` is in `df_current` | ||
if (!datetime_variable %in% names(df_current)) { | ||
cli::cli_abort( | ||
"`datetime_variable` must be present in `df_current`" | ||
) | ||
} | ||
|
||
# Check if datetime_variable can be used by lag | ||
if ( | ||
inherits( | ||
df_current[[datetime_variable]], | ||
c("POSIXct", "POSIXlt", "POSIXt", "Date") | ||
) == FALSE | ||
) { | ||
cli::cli_abort( | ||
"`datetime_variable` must be class of POSIXct, POSIXlt, POSIXt, Date" | ||
) | ||
} | ||
|
||
# Obtain distinct sequences of continuous measurement | ||
df_timeline <- df_current |> | ||
dplyr::mutate( | ||
timelag = ( | ||
.data[[datetime_variable]] - dplyr::lag( | ||
.data[[datetime_variable]], | ||
1 | ||
) | ||
) | ||
) |> | ||
dplyr::mutate( | ||
timeline_group1 = dplyr::case_when( | ||
# Include negative timelag, for example if a sensor cpu shuts down | ||
# It can return to its original date (e.g. 1970-01-01 or when it was | ||
# deployed) | ||
is.na(timelag) | timelag > expected_lag | timelag < -expected_lag ~ 1, | ||
TRUE ~ 2 | ||
) | ||
) |> | ||
dplyr::mutate( | ||
timeline_group = cumsum(.data$timeline_group1 == 1) | ||
) |> | ||
dplyr::select( | ||
-"timeline_group1" | ||
) | ||
|
||
return(df_timeline) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.