Skip to content

Commit

Permalink
readme and vignette updates for timeline functions
Browse files Browse the repository at this point in the history
  • Loading branch information
thomaszwagerman committed Nov 6, 2024
1 parent b9dfe3a commit 6129790
Show file tree
Hide file tree
Showing 7 changed files with 102 additions and 27 deletions.
26 changes: 12 additions & 14 deletions R/timeline.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,29 +13,27 @@
#' difference between timesteps in a dataset should not exceed the
#' `expected_lag`.
#'
#' @param df_current data.frame, the newest/current version of dataset x.
#' @param datetime_variable string, the "datetime" variable that should be
#' checked for continuity.
#' @param expected_lag numeric, the acceptable difference between timestep for
#' a timeseries to be classed as continuous. Any difference greater than
#' `expected_lag` will indicate a timeseries is not continuous. Default is 1.
#' The smallest units of measurement present in the column will be used. For
#' example in a column formatted YYYY-MM, month will be used. In a column
#' formatted YYYY-MM-DD day will be used.
#' @inheritParams timeline_group
#'
#' @seealso [timeline_group()]
#'
#' @returns A boolean, TRUE if the timeseries is continuous, and FALSE if there
#' are more than one continuous timeseries within the dataset.
#'
#' @examples
#' # This example contains no differences with previous data
#' # Our datetime column is formatted YYYY-MM-DD, and we expect an observation
#' # every month, therefore our expected lag is 31 (days).
#' # A nice continuous dataset should return TRUE
#' butterfly::timeline(
#' butterflycount$april,
#' forestprecipitation$january,
#' datetime_variable = "time",
#' expected_lag = 31
#' expected_lag = 1
#' )
#'
#' # In February, our imaginary rain gauge's onboard computer had a failure.
#' # The timestamp was reset to 1970-01-01
#' butterfly::timeline(
#' forestprecipitation$february,
#' datetime_variable = "time",
#' expected_lag = 1
#' )
#'
#' @export
Expand Down
7 changes: 6 additions & 1 deletion R/timeline_group.R
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,13 @@
#' data and `timelag` which specifies the time lags between rows.
#'
#' @examples
#' # A nice continuous dataset should return TRUE
#' # In February, our imaginary rain gauge's onboard computer had a failure.
#' # The timestamp was reset to 1970-01-01
#'
#' # We want to group these different distinct continuous sequences:
#' butterfly::timeline_group(
#' forestprecipitation$january,
#' forestprecipitation$february,
#' datetime_variable = "time",
#' expected_lag = 1
#' )
Expand Down
3 changes: 3 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,10 @@ The butterfly package contains the following:
* `butterfly::catch()` - returns rows which contain previously changed values in a dataframe.
* `butterfly::release()` - drops rows which contain previously changed values, and returns a dataframe containing new and unchanged rows.
* `butterfly::create_object_list()` - returns a list of objects required by all of `loupe()`, `catch()` and `release()`. Contains underlying functionality.
* `butterfly::timeline()` - check if a timeseries is continuous between timesteps.
* `butterfly::timeline_group()` - group distinct, but continuous sequences of a timeseres.
* `butterflycount` - a list of monthly dataframes, which contain fictional butterfly counts for a given date.
* `forestprecipitation` - a list of monthly dataframes, which contain fictional daily precipitation measurements for a given date.

## Examples

Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,14 @@ The butterfly package contains the following:
- `butterfly::create_object_list()` - returns a list of objects required
by all of `loupe()`, `catch()` and `release()`. Contains underlying
functionality.
- `butterfly::timeline()` - check if a timeseries is continuous between
timesteps.
- `butterfly::timeline_group()` - group distinct, but continuous
sequences of a timeseres.
- `butterflycount` - a list of monthly dataframes, which contain
fictional butterfly counts for a given date.
- `forestprecipitation` - a list of monthly dataframes, which contain
fictional daily precipitation measurements for a given date.

## Examples

Expand Down
21 changes: 13 additions & 8 deletions man/timeline.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 6 additions & 1 deletion man/timeline_group.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 56 additions & 3 deletions vignettes/butterfly.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ butterflycount

This dataset is entirely fictional, and merely included to aid demonstrating butterfly's functionality.

## Examining datasets: loupe()
## Examining datasets: `loupe()`

We can use `butterfly::loupe()` to examine in detail whether previous values have changed.

Expand Down Expand Up @@ -70,7 +70,7 @@ butterfly::loupe(

Call `?waldo::compare()` to see the full list of arguments.

## Extracting unexpected changes: catch()
## Extracting unexpected changes: `catch()`

You might want to return changed rows as a dataframe. For this `butterfly::catch()`is provided.

Expand All @@ -86,7 +86,7 @@ df_caught <- butterfly::catch(
df_caught
```

## Dropping unexpecrted changes: release()
## Dropping unexpected changes: `release()`

Conversely, `butterfly::release()` drops all rows which had changed from the previous version. Note it retains new rows, as these were expected.

Expand Down Expand Up @@ -114,6 +114,59 @@ df_release_without_new
```

## Checking for continuity: `timeline()`
To check if a timeseries is continuous, `timeline()` and `timeline_group()` are
provided. Even if a timeseries does not contain obvious gaps, this does not
automatically mean it is also continuous.

Measuring instruments can have different behaviours when they fail. For
example, during power failure an internal clock could reset to "1970-01-01",
or the manufacturing date (say, "2021-01-01"). This leads to unpredictable
ways of checking if a dataset is continuous.

To check if a timeseries is continuous:

```{r check_continuity}
butterfly::timeline(
forestprecipitation$january,
datetime_variable = "time",
expected_lag = 1
)
```

The above is a nice continuous dataset, where there is no more than a difference
of 1 day between timesteps.

However, in February our imaginary rain gauge's onboard computer had a failure.

The timestamp was reset to 1970-01-01:

```{r not_continuous}
forestprecipitation$february
butterfly::timeline(
forestprecipitation$february,
datetime_variable = "time",
expected_lag = 1
)
```

## Grouping distinct continuous sequences: `timeline_group()`

If we wanted to group chunks of our timeseries that are distinct, or broken up
in some way, but still continuous, we can use `timeline_group()`:

```{r timeline_group}
butterfly::timeline_group(
forestprecipitation$february,
datetime_variable = "time",
expected_lag = 1
)
```

We now have groups 1 & 2, which are both continuous sets of data, but there is
no continuity between them.

## Using `butterfly` in a data processing pipeline

If you would like to know more about using `butterfly` in an operational data processing pipeline, please refer to the article on [using `butterfly` in an operational pipeline](https://thomaszwagerman.github.io/butterfly/articles/butterfly_in_pipeline.html).
Expand Down

0 comments on commit 6129790

Please sign in to comment.