-
Notifications
You must be signed in to change notification settings - Fork 537
Time date processing for riak_ts
There are several overlapping tasks that arise from recent discussions regarding desired time features. These include:
- Allow users to enter
ISO 8601 time/date strings
(including timezones) in SQL (both input and query).
- Dates will be converted to UNIX epoch for storage.
- If a timezone is not supplied, a decision must be made: reject, assume UTC, or assume system default timezone.
- Leap seconds make queries much trickier since the same UNIX timestamp occurs on two different days.
- A failure by the operators to maintain updated system timezone files would potentially result in unexpected behavior.
- Allow users to use time operations in queries (see Postgres' doc for examples).
- Display resulting timestamps in
riak-shell
as ISO 8601 format.- The output could be UTC, the system's timezone, drawn from the table definition, or even specified as part of the query.
- Allow operators to specify alternative timezones as the default for a table.
- Impacts quantum boundaries.
- May impact output from
riak-shell
if timestamps are processed into ISO 8601 andriak-shell
has access to the table metadata. - Probably impact default timezone on input and query date strings.
- Would not impact storage: dates would still be stored as UTC.
- A failure by the operators to maintain updated system timezone files would potentially result in unexpected behavior.
- Allow operators to specify explicit integer values as a quantum function.
- Allows the database to have any zero point that makes sense for the application.
- Makes it easier to semantically treat the values as something other than time values (e.g. geolocation).
- Allow operators to define the meaning of a timestamp increment.
- Removes the assumption that all time values are in milliseconds.
- Might, for simplicity, require an integer-based quantum function.
- Allow clients to determine any above table-specific time-related metadata for display purposes.
- Allow clients to set a session-specific time zone offset (see Oracle for an illustration).
- Implement
date
andtime
data types.
Also, there's clearly a significant documentation task: we need to make certain the current assumptions and implications are captured for end users, along with the impact of changes from the above tasks. We have made the implementation almost irrelevant to end users, but many of them will still wish to understand how this fits together.
It is also important to document that use of the riak-shell
translations to and from user-friendly time strings will impact
performance.
Currently, users must enter integer values for times, assumed for quanta calculations to be UNIX epoch values in UTC.
- Client libraries are free to accept any time/date format and process
them into UNIX epoch time, much like
riak-shell
could do. - The timestamp field in the database is a 64-bit unsigned integer.
- The epoch for UNIX time is midnight January 1, 1970 UTC (effectively
GMT).
- The specification of a DDL quantum function in terms of days will result in quanta boundaries at midnight UTC.
- Timezones with partial offsets from UTC (e.g. IST) will have quanta boundaries that do not fall on the local hour.
- The time values are assumed to be in milliseconds, which impacts all quanta calculations.
- The system clock and default timezone is currently completely irrelevant. All time values are calculated (quantum boundaries) or specified as input by the user/application.
- Leap seconds are silently swallowed into the timestamp.
- If a second is added at the end of a day, both that second and the one that follows are represented by a single timestamp. Wikipedia
Other than determining quantum boundaries, the fact that timestamps are assumed to be UNIX epoch values in milliseconds has zero consequences for Riak or the user. This means that the integer values currently stored in Riak can have any semantic meaning applied that the application/user so chooses.
All of these features are largely independent, although there is some sequencing that can be done based on utility and level of effort.
Allow clients to determine any above table-specific time-related metadata for display purposes is effectively a subtask of the previously-defined task to expose table metadata to client libraries.
- Allow users to enter ISO 8601 date/time strings in SQL (input and query).
- Display resulting timestamps in
riak-shell
as ISO 8601 format. - Allow clients to specify a session-specific time zone offset.
- Allow users to use time operations in queries.
- Implement
date
andtime
data types.
- Allow operators to specify explicit integer values as a quantum function.
- Allow operators to define the meaning of a timestamp increment.
- Allow operators to specify alternative timezones as the default for a table.
Date parsing will have to take place in Riak, with associated performance impact. Applications should do their own translation to UNIX time whenever practical.
Gordon has an existing date parsing library that handles some subset of the necessary changes.
Adding new keywords is problematic because SQL queries and inserts may have to
be modified with double quotes to escape column names after that point in time.
riak-shell
could issue warnings on future reserved words with planning.