From 6129790647625a65a32fb8b6c811a05e42de8993 Mon Sep 17 00:00:00 2001 From: thomaszwagerman Date: Wed, 6 Nov 2024 17:38:10 +0000 Subject: [PATCH] readme and vignette updates for timeline functions --- R/timeline.R | 26 +++++++++--------- R/timeline_group.R | 7 ++++- README.Rmd | 3 +++ README.md | 6 +++++ man/timeline.Rd | 21 +++++++++------ man/timeline_group.Rd | 7 ++++- vignettes/butterfly.Rmd | 59 ++++++++++++++++++++++++++++++++++++++--- 7 files changed, 102 insertions(+), 27 deletions(-) diff --git a/R/timeline.R b/R/timeline.R index b358e02..f702cfe 100644 --- a/R/timeline.R +++ b/R/timeline.R @@ -13,15 +13,7 @@ #' difference between timesteps in a dataset should not exceed the #' `expected_lag`. #' -#' @param df_current data.frame, the newest/current version of dataset x. -#' @param datetime_variable string, the "datetime" variable that should be -#' checked for continuity. -#' @param expected_lag numeric, the acceptable difference between timestep for -#' a timeseries to be classed as continuous. Any difference greater than -#' `expected_lag` will indicate a timeseries is not continuous. Default is 1. -#' The smallest units of measurement present in the column will be used. For -#' example in a column formatted YYYY-MM, month will be used. In a column -#' formatted YYYY-MM-DD day will be used. +#' @inheritParams timeline_group #' #' @seealso [timeline_group()] #' @@ -29,13 +21,19 @@ #' are more than one continuous timeseries within the dataset. #' #' @examples -#' # This example contains no differences with previous data -#' # Our datetime column is formatted YYYY-MM-DD, and we expect an observation -#' # every month, therefore our expected lag is 31 (days). +#' # A nice continuous dataset should return TRUE #' butterfly::timeline( -#' butterflycount$april, +#' forestprecipitation$january, #' datetime_variable = "time", -#' expected_lag = 31 +#' expected_lag = 1 +#' ) +#' +#' # In February, our imaginary rain gauge's onboard computer had a failure. +#' # The timestamp was reset to 1970-01-01 +#' butterfly::timeline( +#' forestprecipitation$february, +#' datetime_variable = "time", +#' expected_lag = 1 #' ) #' #' @export diff --git a/R/timeline_group.R b/R/timeline_group.R index 9102da4..6b650da 100644 --- a/R/timeline_group.R +++ b/R/timeline_group.R @@ -33,8 +33,13 @@ #' data and `timelag` which specifies the time lags between rows. #' #' @examples +#' # A nice continuous dataset should return TRUE +#' # In February, our imaginary rain gauge's onboard computer had a failure. +#' # The timestamp was reset to 1970-01-01 +#' +#' # We want to group these different distinct continuous sequences: #' butterfly::timeline_group( -#' forestprecipitation$january, +#' forestprecipitation$february, #' datetime_variable = "time", #' expected_lag = 1 #' ) diff --git a/README.Rmd b/README.Rmd index 1d26a64..f5fc07c 100644 --- a/README.Rmd +++ b/README.Rmd @@ -52,7 +52,10 @@ The butterfly package contains the following: * `butterfly::catch()` - returns rows which contain previously changed values in a dataframe. * `butterfly::release()` - drops rows which contain previously changed values, and returns a dataframe containing new and unchanged rows. * `butterfly::create_object_list()` - returns a list of objects required by all of `loupe()`, `catch()` and `release()`. Contains underlying functionality. + * `butterfly::timeline()` - check if a timeseries is continuous between timesteps. + * `butterfly::timeline_group()` - group distinct, but continuous sequences of a timeseres. * `butterflycount` - a list of monthly dataframes, which contain fictional butterfly counts for a given date. + * `forestprecipitation` - a list of monthly dataframes, which contain fictional daily precipitation measurements for a given date. ## Examples diff --git a/README.md b/README.md index 22c7936..ee53193 100644 --- a/README.md +++ b/README.md @@ -67,8 +67,14 @@ The butterfly package contains the following: - `butterfly::create_object_list()` - returns a list of objects required by all of `loupe()`, `catch()` and `release()`. Contains underlying functionality. +- `butterfly::timeline()` - check if a timeseries is continuous between + timesteps. +- `butterfly::timeline_group()` - group distinct, but continuous + sequences of a timeseres. - `butterflycount` - a list of monthly dataframes, which contain fictional butterfly counts for a given date. +- `forestprecipitation` - a list of monthly dataframes, which contain + fictional daily precipitation measurements for a given date. ## Examples diff --git a/man/timeline.Rd b/man/timeline.Rd index 00a9b33..79ed1e5 100644 --- a/man/timeline.Rd +++ b/man/timeline.Rd @@ -15,9 +15,8 @@ checked for continuity.} \item{expected_lag}{numeric, the acceptable difference between timestep for a timeseries to be classed as continuous. Any difference greater than \code{expected_lag} will indicate a timeseries is not continuous. Default is 1. -The smallest units of measurement present in the column will be used. For -example in a column formatted YYYY-MM, month will be used. In a column -formatted YYYY-MM-DD day will be used.} +The smallest units of measurement present in the column will be used. In a +column formatted YYYY-MM-DD day will be used.} } \value{ A boolean, TRUE if the timeseries is continuous, and FALSE if there @@ -39,13 +38,19 @@ difference between timesteps in a dataset should not exceed the \code{expected_lag}. } \examples{ -# This example contains no differences with previous data -# Our datetime column is formatted YYYY-MM-DD, and we expect an observation -# every month, therefore our expected lag is 31 (days). +# A nice continuous dataset should return TRUE butterfly::timeline( - butterflycount$april, + forestprecipitation$january, datetime_variable = "time", - expected_lag = 31 + expected_lag = 1 +) + +# In February, our imaginary rain gauge's onboard computer had a failure. +# The timestamp was reset to 1970-01-01 +butterfly::timeline( + forestprecipitation$february, + datetime_variable = "time", + expected_lag = 1 ) } diff --git a/man/timeline_group.Rd b/man/timeline_group.Rd index 268383a..e41715d 100644 --- a/man/timeline_group.Rd +++ b/man/timeline_group.Rd @@ -45,8 +45,13 @@ logic in case_when(). } } \examples{ +# A nice continuous dataset should return TRUE +# In February, our imaginary rain gauge's onboard computer had a failure. +# The timestamp was reset to 1970-01-01 + +# We want to group these different distinct continuous sequences: butterfly::timeline_group( - forestprecipitation$january, + forestprecipitation$february, datetime_variable = "time", expected_lag = 1 ) diff --git a/vignettes/butterfly.Rmd b/vignettes/butterfly.Rmd index eb434d1..5a4ce31 100644 --- a/vignettes/butterfly.Rmd +++ b/vignettes/butterfly.Rmd @@ -31,7 +31,7 @@ butterflycount This dataset is entirely fictional, and merely included to aid demonstrating butterfly's functionality. -## Examining datasets: loupe() +## Examining datasets: `loupe()` We can use `butterfly::loupe()` to examine in detail whether previous values have changed. @@ -70,7 +70,7 @@ butterfly::loupe( Call `?waldo::compare()` to see the full list of arguments. -## Extracting unexpected changes: catch() +## Extracting unexpected changes: `catch()` You might want to return changed rows as a dataframe. For this `butterfly::catch()`is provided. @@ -86,7 +86,7 @@ df_caught <- butterfly::catch( df_caught ``` -## Dropping unexpecrted changes: release() +## Dropping unexpected changes: `release()` Conversely, `butterfly::release()` drops all rows which had changed from the previous version. Note it retains new rows, as these were expected. @@ -114,6 +114,59 @@ df_release_without_new ``` +## Checking for continuity: `timeline()` +To check if a timeseries is continuous, `timeline()` and `timeline_group()` are +provided. Even if a timeseries does not contain obvious gaps, this does not +automatically mean it is also continuous. + +Measuring instruments can have different behaviours when they fail. For +example, during power failure an internal clock could reset to "1970-01-01", +or the manufacturing date (say, "2021-01-01"). This leads to unpredictable +ways of checking if a dataset is continuous. + +To check if a timeseries is continuous: + +```{r check_continuity} +butterfly::timeline( + forestprecipitation$january, + datetime_variable = "time", + expected_lag = 1 + ) +``` + +The above is a nice continuous dataset, where there is no more than a difference +of 1 day between timesteps. + +However, in February our imaginary rain gauge's onboard computer had a failure. + +The timestamp was reset to 1970-01-01: + +```{r not_continuous} +forestprecipitation$february + +butterfly::timeline( + forestprecipitation$february, + datetime_variable = "time", + expected_lag = 1 + ) +``` + +## Grouping distinct continuous sequences: `timeline_group()` + +If we wanted to group chunks of our timeseries that are distinct, or broken up +in some way, but still continuous, we can use `timeline_group()`: + +```{r timeline_group} +butterfly::timeline_group( + forestprecipitation$february, + datetime_variable = "time", + expected_lag = 1 + ) +``` + +We now have groups 1 & 2, which are both continuous sets of data, but there is +no continuity between them. + ## Using `butterfly` in a data processing pipeline If you would like to know more about using `butterfly` in an operational data processing pipeline, please refer to the article on [using `butterfly` in an operational pipeline](https://thomaszwagerman.github.io/butterfly/articles/butterfly_in_pipeline.html).