Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add calendars gregorian_tai and gregorian_utc #148

Closed
JimBiardCics opened this issue Oct 26, 2018 · 298 comments
Closed

Add calendars gregorian_tai and gregorian_utc #148

JimBiardCics opened this issue Oct 26, 2018 · 298 comments
Assignees
Labels
agreement not to change Issue closed with agreement not to make a change to the conventions enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@JimBiardCics
Copy link
Contributor

JimBiardCics commented Oct 26, 2018

Introduction

The current CF time system does not address the presence or absence of leap seconds in data with a standard name of time. This is not an issue for model runs or data with time resolutions on the order of hours, days, etc, but it can be an issue for modern satellite swath data and other systems with time resolutions of tens of seconds or finer.

I have written a background section for this proposal, but I have put it at the end so that people don't have to scroll through it in order to get to proposal itself. If something about the proposal seems unclear, I hope the background will help resolve your question.

Proposal

After past discussions with @JonathanGregory and again with he and @marqh at the 2018 CF Workshop, I propose the new calendars listed below and a change to existing calendar definitions.

  • gregorian_tai - When this calendar is called out, the epoch date and time stated in the units attribute are required to be Coordinated Universal Time (UTC) and the time values in the variable are required to be fully metric, representing the the advance in International Atomic Time (TAI) since that epoch. Conversion of a time value in the variable to a UTC date and time must account for any leap seconds between the epoch date and the time being converted.
  • gregorian_utc - When this calendar is called out, the epoch date and time stated in the units attribute are required to be in UTC and the time values in the variable are assumed to be conversions from UTC dates and times that did not account for leap seconds. As a consequence, the time values may not be fully metric. Conversion of a time value in the variable to a UTC date and time must not use leap seconds.
  • gregorian - When this calendar is called out, the epoch date stated in the units attribute is required to be in mixed Gregorian/Julian form. The epoch date and time have an unknown relationship to UTC. The time values in the variable may not be fully metric, and conversion of a time value in the variable to a date and time produces results of unknown precision.
  • the others - The other calendars all have an unknown relationship to UTC, similar to the gregorian calendar above.

The large majority of existing files (past and future) are based on artificial model time or don't need to record time precisely enough to require either of the new calendars (gregorian_tai or gregorian_utc). The modified definition of the gregorian calendar won't pose any problem for them. For users that know exactly how they obtained their times and how they processed them to get time values in a variable, the two new calendars allow them to tell users how to handle (and not handle) those time values.

Once we come to an agreement on the proposal, we can work out wording Section 4.4 to reflect these new/changed calendar definitions.

Background

There are three parts to the way people deal with time. The first part is the counting of the passing of time, the second part is the representation of time for human consumption, and the third is the relationship between the representation of time and the orbital and rotational cycles of the earth. This won't be a deep discussion, but I want to define a few terms here in the hopes that it will help make things clearer. For gory details, please feel free to consult Google and visit places such as the NIST and US Naval Observatory websites. I'm glossing over some things here, and many of my definitions are not precise. My goal is to provide a common framework for thinking about the proposal, as opposed to writing a textbook on the topic.

The first part is the simplest. This is time as a scalar quantity that grows at a fixed rate. This, precisely measured, is what people refer to as 'atomic time' - a count of cycles of an oscillator tuned to resonate with an electron level transition in a sample of super-cooled atoms. The international standard atomic time is known as International Atomic Time (TAI). So time in this sense is a counter that advances by one every SI second. (For simplicity, I am going to speak in terms of counts of seconds throughout this proposal.) No matter how you may represent time, whether with or without leap days or seconds, this time marches on at a fixed pace. This time is metric. You can do math operations on pairs or other groups of these times and get consistently correct results. In the rest of this proposal I'm going to refer to this kind of time as 'metric time'.

The second part, the representation of time, is all about how we break time up into minutes, hours, days, months, and years. Astronomy, culture, and history have all affected the way we represent time. When we display a time as YYYY-MM-DD HH:MM:SS, we are representing a point in time with a label. In the rest of this proposal I'm going to refer to this labeling of a point in time as a time stamp.

The third part, the synchronization of time stamps with the cycles of the planet, is where calendars come into play, and this is where things get ugly. Reaching way back in time, there were three basic units for time - the solar year, the lunar month, and the solar day. Unfortunately, these three units of time are not compatible with each other or with counts of seconds. A solar day is not (despite our definitions) an integer number of seconds in length, a lunar month is not an integer number of solar days (and we pretty much abandoned them in Western culture), and a solar year is not an integer number of solar days or lunar months in length. If you attempt to count time by incrementing a time stamp like an odometer - having a given element increment once each time the element below it has 'rolled over', you find that the time stamps pretty quickly get out of synchronization with the sun and the seasons.

The first attempts to address this asynchrony were leap days. The Julian calendar specified that every four years February would wait an extra day to roll over to March. The Gregorian calendar addressed a remaining asynchrony by specifying that this only happens on the last year of a century (when it normally would) every fourth century. That was close enough for the technology of those days. Clocks weren't accurate enough at counting seconds to worry about anything else. But the addition of leap days (as well as months with random lengths) means that time stamps aren't metric. You can't do straightforward math with them.

In more recent times technology and science have advanced to the point that we can count seconds quite accurately, and we found that keeping the time stamp hours, minutes, and seconds sufficiently aligned with the rising of the sun each day requires the addition (or subtraction) of leap seconds. On an irregular, potentially bi-yearly, basis, the last minute of a day is allowed to run to 60 before rolling over instead of 59 (or rolls over after 58, though it's lately been only additions). Coordinated Universal Time (UTC) is the standard for time stamps that include both leap days and leap seconds.

UTC time stamps represent the time in a human-readable form that is precise and synchronized with the cycles of the earth. But they aren't metric. It's not hard to deal with the leap days part because they follow a fixed pattern. But the leap seconds don't. If you try to calculate the interval between 2018-01-01 00:00:00 and 1972-01-01 00:00:00 without consulting a table of leap seconds and when they were applied, you will have a difference of 27 seconds between the time you get from your calculation and the time has actually elapsed between those two time stamps. This isn't enough of a discrepancy to worry about for readings from rain gauges or measurements of daily average temperature, but an error of even one second can make a big difference for data from a polar-orbiting satellite moving at a rate of 7 km/second.

The clocks in our computers can add further complexity to measuring time. The vast majority of computers don't handle leap seconds. We typically attempt to address this by using time servers to keep our computer clocks synchronized, but this is done by altering the metric time count in the computer rather than modifying the time stamps by updating a table of leap seconds.

Furthermore, most computer software doesn't have 'leap second aware' libraries. When you take a perfectly exact UTC time stamp (perhaps taken from a GPS unit) and convert it to a count of seconds since an epoch using a time calculation function in your software, you are highly likely to have introduced an error of however many leap seconds that have been added between your epoch and the time represented by the time stamp.

As a result of all this, many of the times written in netCDF files are not metric times, and there is no good way to know how to produce accurate time stamps from them. They may be perfectly metric within a given file or dataset, they may include skips or repeats, or they may harbor non-linearities where there are one or more leap seconds between two time values.

We have another minor issue for times prior to 1972-01-01. There's not much way to relate times prior to that epoch to times since - not to the tens of seconds or better level. I'd be surprised if this would ever be a significant problem in our domain.

To summarize, we have TAI, which is precise metric time. We have UTC, which is a precise, non-metric sequence of time stamps that are tied to TAI, and we have a whole host of ways that counts time since epoch stored in netCDF files can be inaccurate to a level as high as 37 seconds (the current leap seconds offset between TAI and UTC).

Most uses of time in netCDF aren't concerned with this level of accuracy, but for those that are, it can be critical.

@cameronsmith1
Copy link

I remember when CF discussed time a few years ago. It was the longest discussion I ever followed on CF. You have addressed the main points that I remember.

@hrajagers
Copy link

The proposal looks sensible to me. One item that might be worthwhile to discuss or mention is the way in which times should be specified when the calendar is "gregorian_tai". It seems to me that implicit in that definition is included the requirement to store time as "seconds since ..." a reference date/time. For any time unit larger than seconds it will be difficult to do conversions to seconds consistently ... unless we define "minutes" as strictly 60 seconds and so on.

@JimBiardCics
Copy link
Contributor Author

All units in time variables follow their UDUNITS definitions. According to UDUNITS, seconds are SI seconds, minutes are 60 seconds, hours are 3600 seconds, days are 86400 seconds, weeks are 604800 seconds (7 days), and fortnights are 1209600 seconds (14 days). Years are 31556925.9747 seconds (365.242198781 days). Months are a mess, with 1 month defined as being 1/12 of a year - 30.436849898 days. As CF says in section 4.4, time should not use units of months or years, as those quantities will produce unexpected results if you are not very careful with them.

I see no problem in storing times in any units from yoctoseconds (1e-24 seconds) up to and including fortnights, as they are all clean and consistent multiples of SI seconds.

@cameronsmith1
Copy link

cameronsmith1 commented Oct 26, 2018

If you are going to specify seconds (or yocto seconds), then is it necessary to specify what type of number (integer, real) it is, to make sure that the specified number can be large enough and precise enough to be useful? Specifically, if using some sort of integer and the number of seconds you need could exceed the maximum value for some types of integer, and if you are using real numbers there may not be enough precision to distinguish between one second and the next (when the number of seconds gets large).

@ChrisBarker-NOAA
Copy link
Contributor

@cameronsmith1:

a given data file is definition is using a particular type for a variable -- so yes, the file creator needs to be thoughtful about it, but I don't think CF has to say anything about it.

@davidhassell davidhassell added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Oct 29, 2018
@martinjuckes
Copy link
Contributor

Hello Jim, thanks for directing me here from the mail list discussion on 360 day calendars.

As you might guess from my latest contribution to that discussion, I have reservations about relaxing the specification of time to allow a non-metric interpretation. Introducing a system which makes the interpretation of the units dependant on the value of an additional attribute looks like a substantial step to me, and I can't see any justification for it.

I'm not sure if I understand the comments in your proposal about non-metric time values. Take, for example, a short period spanning the last leap second, which occurred at 2016-12-31 23:59:60. As a result of this leap second being inserted 2 minutes since 2016-12-31 23:59:00 should be 0 minutes since 2017-01-01 00:00:59 rather than 0 minutes since 2017-01-01 00:01:00. This may be counter-intuitive, but the measure of time in minutes is still metric. The minutes counter in the time stamp is not enumerating constant intervals of 1 minute, just as the year counter is not enumerating years of constant length (the Gregorian year, neglecting leap seconds, is 365.2425 days, while the counter in the time stamp sometimes represents an increment of 366, other times 365).

Software often adopts a simplified relationship between the counters in the time stamp and the metric time. An extreme case of this is the 360 day calendar we have been discussing on the link I mention above .. in which we have 30 days to the months and 12 months to the year, so that all counter increments relate directly to specific time intervals.

My understanding is that by default the time stamp (the date following since in the units string) follows ISO 8601, which does include leap seconds. However, the leap seconds are not in the UDUNITS software, so we don't have an easy way of making use of this. The current CF convention implies that the interpretation of the units string follows UDUNITS, and UDUNITS always treats the time stamp as being in the standard Gregorian/Julian calendar. I have the impression that this is not always the intended meaning ... but would be a diversion from this thread.

All the references I've found indicate that the time elapsed in the UTC system is exactly, by definition, time elapsed as measured by the atomic clock. The only difference is that UTC includes a concept of days, hours, and minutes, and the UTC minute does not have a constant length.

It seems to me that the distinction is between the Julian/Gregorian calendar, in which the interval between 2016-12-31 23:59:00 and 2017-01-01 00:01:00 is 120 seconds, and the standard UTC calendar, in which this interval is 121 seconds.

Wouldn't it be sufficient, as far as the CF standard is concerned, to recognise that the Gregorian/Julian calendar is no longer standard .. and perhaps introduce the term you suggest, gregorian_utc, as a new alias for standard?

There is a separate problem concerning the UDUNITS implementation ... but if terminology is agreed here, we could discuss that on the UDUNITS repository issues.

@JimBiardCics
Copy link
Contributor Author

@martinjuckes You are exactly correct about the proper interpretation of time around a leap second.

A large number of existing observational datasets obtain time as a UTC time stamp and then convert it to an elapsed time since an epoch using "naive" software which does not take leap seconds into account. A growing number of observational datasets directly acquire precise and accurate elapsed times since an epoch, either from a GPS unit or a satellite operating system, or they acquire time stamps that don't include any leap seconds (TAI time stamps, for example) and convert them using naive software. As it currently stands, those creating the time variables have no way to indicate which way the data was acquired, and users have no way to tell how they should interpret the values.

As I mentioned, the question is often unimportant because the time resolution of the data acquired is coarse enough that it doesn't matter, or because it comes from non-physical systems such as models. When it the question is important, it can be critical.

The CF explanation of the calendar attribute is

In order to calculate a new date and time given a base date, base time and a time increment one must know what calendar to use. For this purpose we recommend that the calendar be specified by the attribute calendar which is assigned to the time coordinate variable.

We are trying to make a way for people to indicate what they have done when it matters without burdening those for whom this is not important.

Here's a table showing the impact of different processing sequences with respect to leap seconds when attempting to obtain a UTC time stamp from a time variable with an accurate UTC epoch date and time (assuming there are one or more leap seconds within the time variable range or since the epoch date and time).

Time Source Conversion to Time Var Accurate Time Var? Conversion from Time Var Result
UTC time stamp Naive No Naive Correct UTC
UTC time stamp Naive No Smart Incorrect
UTC time stamp Smart Yes Naive Incorrect
UTC time stamp Smart Yes Smart Correct UTC
Accurate elapsed time - Yes Naive Incorrect
Accurate elapsed time - Yes Smart Correct UTC

Naive - The conversion is unaware of leap seconds.
Smart - The conversion is aware of leap seconds.
Accurate Time Var - The values in the time variable have no unexpected internal or external offsets due to leap seconds.

The last two entries in the table are, in truth, equivalent to the middle two. It doesn't really matter whether you started with UTC time stamps, TAI time stamps, or GPS elapsed time counts - as long as you end up with accurate TAI-compatible elapsed time values, the conversion to correct UTC time stamps must be smart (properly handle leap seconds).

I'm open to other names for the two new calendars, but I think the suggested names are reasonable.

@ChrisBarker-NOAA
Copy link
Contributor

ChrisBarker-NOAA commented Oct 29, 2018

Lots of good points here, but I think this should be very simple:

The CF explanation of the calendar attribute is

In order to calculate a new date and time given a base date, base time and a time increment one must know what calendar to use. For this purpose we recommend that the calendar be specified by the attribute calendar which is assigned to the time coordinate variable.

The fact is that UTC and TIA are slightly different calendars -- as such users absolutely need to be able to specify which one is appropriate. Done.

So this proposal is a great (maybe the only, really) compromise between the specification we need and backward compatibility.

I am a bit confused by this, though:

gregorian_utc

> the time values may not be fully metric.

huh? not entirely sure what "fully metric" means, but if you have e.g.:

seconds since 1980-01-01T12:00:00

then the actual values in the array should fully represent monotonically increasing values, and time[i] - time[j-i] should always represent exactly the number of seconds that has elapsed.

what is non-metric about that???

BTW, this all is a really good reason to encode time in this way, rather than, say ISO datetime strings....

-CHB

@martinjuckes
Copy link
Contributor

@JimBiardCics Thanks for that detailed reply. I think I understand now. I've also done a bit more background reading and learned how extensive this problem is, causing disruption to companies like Google and Amazon, who have both adopted messy (and mutually inconsistent) work-arounds to the problem.

I'd like to split this into two related problems:
(1) the current CF convention does not fully explain how to deal with leap years, and does not distinguish between TAI time and UTC time;
(2) users struggle with the available tools (which are not good) and we need to make things easier to avoid having repeated errors in the information encoded in our netcdf files;

(1) is reasonably easy to deal with (at first sight) ... we need two calendars which differ by the leap seconds, so that 2018-06-01 12:00:00Z[calender A] corresponds to 37 seconds since 2018-06-01 12:00:00Z[calender B]. In this case the minute counter of calendar A would be of variable length. Calendar A and B would both be based on the Gregorian calendar. The only difficulty is that the translation from Calendar A to Calendar B is undefined for dates after 2019-06-30 -- for a standard such as CF this is problematic.

There also appears to be a slight inconsistency in the convention between the introduction to section 4.4, which explains how the time stamp relates to UTC time, and subsection 4.4.1 (which you quote) which states that the time stamp will be interpreted according to the calendar. Perhaps this just needs a clarification that the introductory remarks only apply for specific calendars. There is a further complication in that Udunits neglects the leap seconds, so it is not accurate at the level you want to achieve here.

(2) introduces some complexity, and I think this is where the idea of a "non-metric" coordinate comes in. In an ideal world we might deal with (2) by improving the available tools, but we don't know when the next leap second is coming (it will be midnight on June 30th or December 31st .. and almost certainly not this year ... but beyond that we have to wait for a committee decision, so nothing can be programmed in advance).

What I think you are suggesting, to address point (2), is that we allow a construction in which 2 minutes since 2016-12-31 23:59:00 to be 0 minutes since 2017-01-01 00:01:00 and for these to be UTC times, so that the length of the 2 minutes is 121 seconds. Consequently, any coordinate we defined through this kind of construction would be, in your words, a non-metric time. I can see the logic of introducing something of this kind, but (and this takes us back to the conversation in the 360 day calendar thread) I don't think we can do it in a variable with standard name time or use units which are defined as fixed multiple of the SI second. The time concept is already very busy, and the metric nature of time plays an important role in many aspects of the physical system. If you accept that this is a new variable I'd be happy to support the proposal (e.g. you suggested abstract_time in the email thread, or datetime is a common tag in libraries dealing with calendars). Similarly, having units of measure which have a fixed meaning independent of context is a firmly established principle in the physical sciences and beyond, so if we want a unit of measure which is a bit like a minute, but sometimes 61 seconds long, I'd be happy to accept this provided it is called something other than minute (and the same goes for hour, day, month, year).

@martinjuckes
Copy link
Contributor

Here is a CDL illustration of how I think this could work, with separate variables for the (1) months and years (with each year 12 months, months variable numbers of days), (2) days, hours, minutes (60 minutes to the hour, 24 hours to the day, variable length minutes) and (3) seconds (SI units .. only counting seconds since last minute). I've not included a true time axis, as trying to convert the times to UTC times in a single parameter would be error prone and defeat the point of the example.

dimensions:
	time = 2 ;
variables:
	float mydata(time) ;
		mydata:coordinates = "time time_ca time_cl seconds" ;
	int time_ca(time) ;
		time_ca:long_name = "Calendar months [years*12 + months]" ;
		time_ca:units = "month_ca since 1980-01-01 00:00:00" ;
		time_ca:calendar = "gregorian_utc" ;
		time_ca:standard_name = "time_ca" ;
	int time_cl(time) ;
		time_cl:long_name = "Seconds elapsed since last full UTC minute" ;
		time_cl:units = "hour_cl since 1980-01-01 00:00:00" ;
		time_cl:calendar = "gregorian_utc" ;
		time_cl:standard_name = "time_cl" ;
	float seconds(time) ;
		seconds:standard_name = "time" ;
                seconds:units = "s";

// global attributes:
		:Conventions = "CF-1.7" ;
		:title = "Sample of proposed new abstract time variables" ;
		:comments = "encodes \'1980-06-01 12:02:15\' and \'1981-06-01 12:04:35\' in UTC time" ;
data:

 mydata = 0, 1 ;

 time_ca = 6, 18 ;

 time_cl = 722, 724 ;

 seconds = 15, 35 ;
}```

@ChrisBarker-NOAA
Copy link
Contributor

About udunits:

There is a further complication in that Udunits neglects the leap seconds, so it is not accurate at the level you want to achieve here.

I'm confused:

CF "punts" to Udunits for unit definitions. But there is nothing in CF that says you have to use udunits for processing your data, and I also don't see any datetime manipulations in Udunits anyway (maybe I've missed something). So what does Udunits have to do with this conversation?

Udunits does make the unfortunate choice of defining "year" and "month" as time units, but I thought CF already discouraged (if not banned) their use.

@ChrisBarker-NOAA
Copy link
Contributor

The only difficulty is that the translation from Calendar A to Calendar B is undefined for dates after 2019-06-30 -- for a standard such as CF this is problematic.

Is it? It's terribly problematic for libraries -- which is why most of them don't deal with it at all. But for CF, I'm not sure it matters:

datetimes are encoded as:

a_time_unit since a_datetime_stamp.

TIA and UTC define seconds the same way. And no one should use a time unit that isn't clearly defined multiple of seconds.

The only difference between TIA and UTC here is when you want to convert that encoding to a datetime stamp -- exactly what datetime stamp you get depends on whether you are using TIA or UTC, but the seconds are still fine.

If someone encodes a datetime in the future, and specifies TIA time, then clients will not be able to properly convert it to a datetimestamp -- but that isn't a CF problem.

@ChrisBarker-NOAA
Copy link
Contributor

Breaking these down into manageable chunks ...

... we allow a construction in which 2 minutes since 2016-12-31 23:59:00 to be 0 minutes since 2017-01-01 00:01:00 and for these to be UTC times, so that the length of the 2 minutes is 121 seconds. Consequently, any coordinate we defined through this kind of construction would be, in your words, a non-metric time.

Yeah, I can see that would be "non-metric" -- but why in the world would we want to allow that? and in fact, how could you allow that???

CF time encoding involves one datetimestamp, not two. It would be a really bad idea to somehow mix and match TIA and UTC in one time variable -- specify the timestamp in UTC, but that processing tools should use TIA from there on??

two minutes is never, ever, anything but 120 seconds. Let's not confuse what I am calling a timestamp -- i.e. the "label" for a point in time from a time duration. So:

the duration between:

2016-12-31 23:59:00 and 2017-01-01 00:01:00

is 120 seconds in the UTC Calendar, and 121 seconds in the TIA calendar.

so:

120 seconds since 2016-12-31 23:59:00 will be a different timestamp if you use the TIA calendar than if you use the UTC calendar -- just as it maybe be different if you use any other calendar....

So what is the problem here?

Granted -- if you want to accurately create or use a tiem variable with the TIA calendar, you need a library that can do that -- but that is not CF's problem.

@JimBiardCics
Copy link
Contributor Author

@ChrisBarker-NOAA As @martinjuckes mentioned in his comment, I'm calling the UTC time 'non-metric' because a time variable for data sampled at a fixed rate based on UTC time stamps converted to elapsed time since an epoch without dealing with leap seconds may contain deviations from what you would expect. If you attempt to 'do math' with the contents, you may find that adding an interval to a time does not produce the time you expected and subtracting two times does not produce the interval expected.

Let's say I have acquired data at regular intervals, dt, and I have captured the time by grabbing accurate UTC time stamps for each acquisition. If I construct my time variable by naively converting my acquired time stamps to elapsed time since my epoch UTC time stamp, I might have one or more problems lurking in my time variable.

  1. If a leap second occurred over the span of my time variable, there will be a point in the times where one of three problems appear.
    1. If dt is greater than 1 second, there will be an interval between successive time values that is less than dt. The formula t[i] = t[0] + i*dt won't hold for all the times in the variable.
    2. If dt is equal to 1 second, the variable also won't be monotonic. There will be a pair of time values that are identical.
    3. If dt is less than 1 second, the variable won't be monotonic, but rather than a pair of time values that are identical, there may be a section where one or more time values are less than a preceding value.
  2. If one or more leap seconds occurred in the time range between my epoch time stamp and my first acquired time stamp, my time values will be internally consistent, but the whole set of values will be smaller than expected.

The non-monotonicity problem is one that I don't even want to get into. And, again, for someone measuring things once an hour (for example) this is all pretty ignorable.

@ChrisBarker-NOAA
Copy link
Contributor

About @martinjuckes' CDL:

I am really confused as to why anyone would ever want to do that.

I remember a thread a while back about encoding time as ISO 8601 strings, rather than "time_unit_ since timestamp" -- at the time, I opposed the idea, but now we have an even better reason why.

If we stick to the current CF convention, then all we need to do is specify the TIA calendar as a valid calendar (and clarify UTC vs TIA) -- that's it -- nothing else needs to change, and there is no ambiguity.

Is the goal here to be able to specify TIA in a way that users can use it without a TIA-aware library? I think that's simply a bad idea -- if you don't have a TIA aware library, you have no business working with TIA times (at least if you care about second-level precision).

@ChrisBarker-NOAA
Copy link
Contributor

@JimBiardCics wrote:

I'm calling the UTC time 'non-metric' because a time variable for data sampled at a fixed rate based on UTC time stamps converted to elapsed time since an epoch without dealing with leap seconds may contain deviations from what you would expect. If you attempt to 'do math' with the contents, you may find that adding an interval to a time does not produce the time you expected and subtracting two times does not produce the interval expected.

Thanks, got it.

My take on this -- if you do that, you have created invalid, incorrect data. CF should not encode this as a valid thing to do. As far as I'm concerned, it's the same as if you did datetime math with a broken library that didn't do leapyears correctly.

And frankly, I'm not sure HOW we could accommodate it anyway -- I'm still a bit confused about exactly when leap seconds are applied to what (that is, I think *nix systems, for instance, will set the current time with leap seconds -- so the "UTC" timestamp is really "TIA as of the last time it was reset". Which I think is the concern here -- if you are collecting data, and are getting a timestamp from a machine, you don't really know which leap-seconds have been applied.

But again, that's broken data....

If we were to try to accommodate this kind of broken data, I have no idea how one would do it? One of the reasons that leap seconds are not used in most time libraries is that they are not predictable. So a lib released last year may compute a different result than one released today -- how could we even encode that in CF?!?!

@JimBiardCics
Copy link
Contributor Author

@ChrisBarker-NOAA The first idea here is to allow data producers a way to clearly indicate that the values in their time variables are metric elapsed time with none of the unexpected discrepancies that I referred to earlier. Instead of gregorian_tia, we could call the calendar gregorian_linear or gregorian_metric or some such. We thought to reference TIA because TIA is, at base, a metric, linear count of elapsed seconds since the TIA epoch date/time.

The second idea here is to allow data producers to clearly indicate that the values in their time variables are non-metric elapsed time potentially containing one or more of the unexpected discrepancies that I referred to earlier, and to indicate that in all but one case they will get correct UTC time stamps if they convert the time values to time stamps using a method that is unaware of leap seconds. This result is not guaranteed if you add an offset to a time value before conversion, and differences between time values may not produce correct results. You may obtain one or more incorrect time stamps if your time values fall in a leap second.

Keep in mind that the potential errors are, as of today, going to be less than or equal to 37 seconds, with many of them being on the order of 1-2 seconds.

For backward compatibility, and for the vast number of datasets out there where errors of this magnitude are ignorable, the existing gregorian calendar (with a warning added in the Conventions section) will remain. It would impose a pretty severe burden to insist that all data producers use only gregorian_tai or gregorian_utc going forward.

The use of the word metric is problematic because minds inevitably go to metric system. I have used it this way so many times when thinking about this that I forget that is is confusing.

@JimBiardCics
Copy link
Contributor Author

@martinjuckes The point here is that we have an existing way to represent time that has been used for quite a few years now. This was never a problem for climate model data or data acquired on hourly or longer time intervals. We may at some future point (CF 2.0?, CF 3.0?) want to consider some significantly different way of handling time. For CF 1.* we want to find a way to accommodate satellite and other high frequency data acquisition systems without imposing unneeded burdens on anyone.

CF says that the purpose of the calendar attribute is to tell you what you need to know to convert the values in a time variable into time stamps. We aren't telling them (at least not directly) how we obtained the values in the time variable. We are telling them how to use them. @JonathanGregory, @marqh, and I came to the conclusion that, while there may be cases we didn't consider, pretty much every time variable anyone might create (within reason) would fall into one of three categories:

  • The elapsed time values are fully metric and their relationship to the UTC epoch date and time is accurate. (The gregorian_tai case.) You must take leap seconds into account when converting these time values into UTC time stamps if you want fully accuracy.
  • The elapsed time values are almost certainly not fully metric and their relationship to the UTC epoch date and time is probably not accurate, but if you convert them to UTC time stamps without adding any offsets, using a method that does not take leap seconds into account, you will get UTC time stamps with full accuracy. (The gregorian_utc case.)
  • We don't have a clue about the metricity or accuracy of the elapsed time values or the epoch date and time. At least not to within 37 seconds. And we don't care. (The updated gregorian case.)

Time representation is a monster lurking just under the surface. Everything was fine until we looked down there. The only pure time is counts of SI seconds (or fractions thereof) since some agreed starting point. Everything else is burdened with thousands of years of history and compromise.

@martinjuckes
Copy link
Contributor

Hi @JimBiardCics : I don't know where you get the idea that CF time is different from SI time : as far as I can see CF time is defined as SI time measured in SI units. Making a change to depart from SI units is a big deal.

@ChrisBarker-NOAA
Copy link
Contributor

ChrisBarker-NOAA commented Oct 30, 2018 via email

@ChrisBarker-NOAA
Copy link
Contributor

ChrisBarker-NOAA commented Oct 31, 2018 via email

@martinjuckes
Copy link
Contributor

Hi @ChrisBarker-NOAA : can't we keep the de-fact definition of days in the Gregorian calendar being exactly 86400 seconds? Precision in the standard is a good thing. Clearly there will be less precision in the time values entered in many cases, but that is up to the users who will, hopefully, provide the level of accuracy required by their application. This would make gregorian equivalent to gregorian_tai.

You ask why it matters to the convention that the UTC calendar is not defined into the future past the next potential leap second: this is an issue of reproducibility and consistency. The changes are small, but we are only introducing this for people who care about small changes. The ambiguity only applies to the time-stamp: the interpretation of the statement x seconds since 2018-06-01 12:00:00Z will not be affected by future leap second, but the meaning of y seconds since 2020-06-01 12:00:00Z may change if a leap second is introduced next year. One solution would be to disallow the use of a time-stamp after the next potential leap second.

I agree with Chris that we can't hope to retro-fix problems which may have occurred due to people putting in information inaccurately. What we need is a clear system which allows people to enter information accurately.

@cf-metadata-list
Copy link

cf-metadata-list commented Oct 31, 2018 via email

@JimBiardCics
Copy link
Contributor Author

@ChrisBarker-NOAA @martinjuckes
For the moment, let's set aside the question of names for the calendars.

There is nothing at all wrong with specifying that the epoch time stamp in the units attribute always be a correct UTC time stamp. In fact, allowing the epoch time stamp to be from a TAI or UTC clock will increase the chances that the data will be handled incorrectly. If you are sophisticated enough to care about TAI, you will have no problem dealing with a UTC time stamp.

I am explicitly assuming that all UTC time stamps are correct and accurate at the times they were acquired or constructed. If you are getting your time stamp from a PC that isn't actively synced by a time server, you shouldn't bother to use either of these new calendars.

When you read time out of a GPS unit, you can get a count of seconds since the GPS epoch, and I believe you can get a GPS time stamp that doesn't contain leap seconds (like TAI time stamps, but with a fixed offset from TAI), but most people get a UTC time stamp. The GPS messages carry the current leap second count and receivers apply it by default when generating time stamps. That's something I learned about from Aaron Sweeney a year or two ago - well after the big discussion we had about all of this a few years back.

There are quite a lot of high-precision data acquisition platforms out there that start with accurate UTC time stamps obtained from GPS receivers. Many of them don't care about metricity. They just want their time stamps, but CF tells them that they must store time as elapsed time since an epoch.

There's not really such thing as an elapsed time that is UTC vs TAI. At core, true elapsed time - a count of SI seconds since an event - is the same for both. The UTC time stamp for that event may not be identical to the TAI time stamp for that same event, but they both reference the same event, and the elapsed time since that event is the same no matter which time system you are using.

The UTC time system provides a prescription in terms of leap seconds for how to keep UTC time stamps synchronized with the rotation of the earth. Just like the Gregorian calendar system provides a prescription in terms of leap days for how to keep date stamps synchronized with the orbit of the earth. The only difference - and it is an admittedly important one - is that the UTC leap second prescription does not follow a fixed formula like the Gregorian leap day prescription does. The UTC time system declares that the time stamp 1958-01-01 00:00:00 references the same time as the matching TAI time stamp. The TAI time system provides no prescription for keeping TAI time stamps synchronized with the rotation of the earth.

Time stamps - whether UTC, TAI, GPS, or something else - are, in essence, a convenience for humans. No matter what time system or calendar system you use, the number of seconds or days that has elapsed since any given date and time is the same.

In a perfect world, all data producers would use a leap-second-aware function to turn their lists of time stamps into elapsed time values and all time variables would be "perfect". That would also force all users of those time variables to use a leap-second-aware function to turn the elapsed time values into time stamps. But that's not the world we live in. Does naive conversion of UTC time stamps into elapsed times have the potential to produce non-monotonic time coordinate variables that violate the CF conventions? Yes. Does it cause any real problems (for the vast majority of cases and instances of time) if people use this "broken" method for encoding and decoding their time stamps? No.

At the end of the day, it doesn't matter much what process you used to create your elapsed time values. For the small number cases where differences of 1 - 37 seconds matter, we are trying to make a way for data producers to signal to data users how they should handle the values in their time variables while staying within the existing CF time framework and acknowledging CF and world history regarding the way we deal with time (which isn't very consistent and is composed of successive corrective overlays over centuries).

@martinjuckes
Copy link
Contributor

Hello @JimBiardCics , thanks again for a detailed response.

I support the aim of allowing users to place either a UTC or TAI time stamp in the units statement (so that they can do whatever fits with their existing processes) and making it possible for them to declare which they are using. The suggestion of using the calendar attribute for this makes sense.

I think we are agreed now that there is a unique and well defined mapping between these time stamps, and there is a unique and well defined way of calculating an elapsed time (in the SI sense) between any two such time stamps. I don't see how the layers of complexity need to come into this. The TAI time stamp counts up with 86400 seconds per TAI day, while the UTC has a known selection of days with an extra second in the final minute.

All we can do is define these things clearly, we can't force users to adopt best practice. As you say, some people don't have accurate clocks, just as some record temperature without accurate thermometers.

I would disagree with you on one point: a non-monotonic data array is no problem, but a non-monotonic coordinate array is a problem. As Chris commented, people who need to know about this sort of problem are likely to have sorted it out before they get around to writing NetCDF files.

@ChrisBarker-NOAA
Copy link
Contributor

@JimBiardCics wrote:

For the moment, let's set aside the question of names for the calendars.

OK, though a bit hard to talk about :-)

In a perfect world, all data producers would use a leap-second-aware function to turn their lists of time stamps into elapsed time values and all time variables would be "perfect".

almost -- I think TIA time is also a perfectly valid system for a "perfect" world :-)

But yeah, most datetime libraries do not handle leap seconds, which, kida ironically, means that folks are using TIA time even if they think they are using UTC :-)

Does naive conversion of UTC time stamps into elapsed times have the potential to produce non-monotonic time coordinate variables that violate the CF conventions? Yes. Does it cause any real problems (for the vast majority of cases and instances of time) if people use this "broken" method for encoding and decoding their time stamps? No.

I'm not so sure -- I think having a time axis that is "non-metric" as you call it can be a real problem. Yes, it could potentially be turned into a series of correct UTC timestamps by reversing the same incorrect math used to produce it, but many use cases are working the the time access in time units (seconds, hours, etc), and need it to be nifty things like monotonic and differentiable, etc.

we are trying to make a way for data producers to signal to data users how they should handle the values in their time variables while staying within the existing CF time framework and acknowledging CF and world history regarding the way we deal with time

Fair enough -- a worthy goal.

There is nothing at all wrong with specifying that the epoch time stamp in the units attribute always be a correct UTC time stamp. In fact, allowing the epoch time stamp to be from a TAI or UTC clock will increase the chances that the data will be handled incorrectly. If you are sophisticated enough to care about TAI, you will have no problem dealing with a UTC time stamp.

I disagree here -- the truth is that TAI is EASIER to deal with -- most datetime libraries handle it just fine in fact, it is all they handle correctly. So I think a calendar that is explicitly TAI is a good idea.

I think we are converging on a few decisions:

  1. Due to legacy, uninformed users, poor library support, and the fact that it just doesn't matter to most use cases, we will have an "ambiguous with regard to leap seconds" calendar in CF. Probably called "gregorian", because that's what we already have, and explicit or not, that's what is means with existing datasets. So we need some better docs here.

  2. Do we need an explicit "UTC" calendar, in which leap seconds ARE taken into account. The file would only be correct if the timestamp is "proper" UTC, and you would get the right (UTC) timestamps back if and only if you used a leap-second-accounting for time library. The values themselves would be "metric" (by Jim's definition)

  3. Do we need an explicit "TAI" calendar. The file would only be correct if the timestamp is "proper" TAI, and you would get the right (TAI) timestamps back if and only if you did not apply leap seconds. The values themselves would be "metric" (by Jim's definition).

Note that the only actual difference between (2) and (3) is that the timestamp is in UTC or TAI, which are different since some time in 1958, but up to 37 seconds. In either case, the values themselves would be "proper", and you could compute differences, etc easily and correctly.

  1. minor point -- do we disallow "days" in any of these, or be happy with 1day == 24 hours == 86400 seconds. I'm fine with days defined this way -- it is almost always correct, and always what people expect. (though it could cause issues, maybe, with some datetime libs, but only those that account for leap-seconds, so I doubt it)

  2. I think this is the contentious one: Do we have a calendar (encoding, really) that is:

Elapsed time since a UTC timestamp, but with elapsed time computed from a correct-with-regard-to-leapseconds UTC time stamp with a library that does not account for leap seconds. This would mean that the values themselves may not be "metric".

I think this is what Jim is proposing.

(by the way, times into the future (when leap-seconds are an unknown) as of the creation of the file should be disallowed)

Arguments for (Jim, you can add here :-) )

  • people are already creating time variables like this -- it would be nice ot be able to explicitly define that that's what you've done, so folks can interpret them exactly correctly.

  • since a lot of instruments., computers, etc. use UTC time with leap seconds applied, and most tiem processing libraries don't support leap seconds -- folks will continue to produce such data, and, in fact have little choice but to do so.

Arguments against:

  • This is technically incorrect data -- it says "seconds since", but it isn't actually always seconds since. We should not allow incorrect data as CF compliant. Bad libraries are not CF's responsibility.

  • A time axis created this way will be non-"metric" - that is, you can't compute elapsed time correctly directly from the values -- this is likely to lead to confusion, but worse still, hard to detect hidden bugs -- that is, code that works on almost any dataset might suddenly fail if a value happens to fall near a leap-second, and you get a zero-length "second" (or even a negative one? -- is that possible).

  • (same as above, really) -- a time variable of this sort can only be used correctly if it is first converted to UTC timestamps.

  • There may be issues with processing this data with some (most?) time libraries (in particular the ones that don't account for leap-seconds). This is because if you convert to a UTC timestamp with leap-seconds, you can get a minute with 60 seconds in it, for example:

December 31, 2016 at 23:59:60 UTC

And some time libraries do not allow that.

Example python's datetime:

In [3]: from datetime import datetime

In [4]: datetime(2016, 12, 31, 23, 59, 60)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-a8e1ba1d62e5> in <module>()
----> 1 datetime(2016, 12, 31, 23, 59, 60)

ValueError: second must be in 0..59

Given these trade-offs, I think CF should not support this -- but if others feel differently, fine -- but do not call it "UTC" or "TAI"! -- and document it carefully!.

That last point is key -- this entire use-case is predicated on the idea that folks are working with full-on-proper-leap-second-aware UTC timestamps, but processing them with a non-leap-second-aware library -- and that this is a fully definable and reversible process. But at least with one commonly used datetime library (Python's built-in datetime). It simply will not work for every single case -- it will work for almost every case, so someone could process this data for years and never notice, but it's not actually correct! In fact, I suspect most computer systems can't handle: December 31, 2016 at 23:59:60 UTC, and will never give you that value -- rather, (IIUC) they accommodate leap seconds by resetting the internal clock so that "seconds since the epoch" gives the correct UTC time when computed without leap seconds. But that reset happens at best one second too late (so that you won't get that invalid timestamp).

All this leads me to believe that if anyone really cares about sub-second-level precision over a period of years, then they really, really should be using TAI, and if they THINK they are getting one-seconds precision, they probably aren't, or have hidden bugs waiting to happen. I don't think we should support that in CF.

Final point:

When you read time out of a GPS unit, you can get a count of seconds since the GPS epoch, and I believe you can get a GPS time stamp that doesn't contain leap seconds (like TAI time stamps, but with a fixed offset from TAI), but most people get a UTC time stamp. The GPS messages carry the current leap second count and receivers apply it by default when generating time stamps.

OK -- but I suspect that yes, most people get a UTC timestamp, and most people don't understand the difference, and most people don't care about second-level accuracy over years.

The over years part is because if you have, say, a GPS track you are trying to encode in CF, you should use a reference timestamp that is close to your data -- maybe the start of the day you took the track. So unless you happen to be collecting data when a leap second occurs, there will be no problem.

For those few people that really do care about utmost precision -- they should use the TAI timestamp from their GPS -- and if it's a consumer-grade GPS that doesn't provide that -- they should get a new GPS! It's probably easier to figure out how to get TAI time from a GPS than it is to find a leap-second-aware time library :-)

Side note: Is anyone aware of a proper leap-second aware time library??

Sorry for the really long note -- but I do think we are converging here, and I won't put up a stink if folks want to add the sort-of-utc calendar -- as long as it's well named and well documented.

-CHB

@JimBiardCics
Copy link
Contributor Author

@martinjuckes
Let me make something clear that seems to have been unclear. I am not proposing that the new calendars are declaring what sort of epoch time stamp is in the units attribute of time variables.

I think it is a bad idea to allow a TAI time stamp to be placed in the units attribute as the epoch for the elapsed times in the time variable. It adds nothing but confusion. The epoch should always be specified with a UTC time stamp. This causes no problems whatsoever. Someone going to the trouble of getting exact and correct elapsed times in their time variables will have no trouble figuring out the proper UTC time stamp for their epoch. Users, on the other hand, would be faced with figuring out which way to handle the epoch time stamp. If the user is aware of the implications of having exact and correct elapsed times in a time variable, they will also have the software on hand that is needed if they want to get time stamps from the time variable, or they will get it because it is important to them. If they aren't aware, a TAI epoch time stamp will maximize the error they will get if they perform a leap-second-naive conversion to get time stamps from the time variable contents and mistakenly think are getting a UTC time stamp.

When I was talking about monotonicity I was intending it in reference to coordinate variables. I disagree with Chris (whether that is @ChrisBarker-NOAA or the unknown Chris who commented through the CF Metadata List account). People who are converting precise and accurate UTC time stamps into values in time variables using the tools most available to software developers for handling time without sensitivity to leap seconds are creating time variables that have the potential to be non-monotonic because of leap seconds. They have done it in the past, they are doing it now, and they will very likely continue to do so. It may be that they avoid the problem by breaking their data into files on UTC day boundaries (or hourly boundaries, etc), but if you aggregate those files you will create non-monotonic time variables. They will continue to do so because the potential one second per year non-monotonicity is less of a problem than forcing all their users to start decoding time variables into time stamps with leap-second-aware functions that are not readily available.

As I said before, this is a compromise position. Telling a significant group of data producers that they must change their software in a way that will cause widespread problems for their users is a good way to get them to ignore you.

@JimBiardCics
Copy link
Contributor Author

On another front, I'm perfectly happy to entertain different names for the new calendars. I'm not a particular fan of gregorian_tai or gregorian_utc**.

@ChrisBarker-NOAA
Copy link
Contributor

People who are converting precise and accurate UTC time stamps into values in time variables using the tools most available to software developers for handling time without sensitivity to leap seconds are creating time variables that have the potential to be non-monotonic because of leap seconds. They have done it in the past, they are doing it now, and they will very likely continue to do so.

Agreed -- and we need to accommodate that -- which we already do implicitly with "gregorian", and we should make it explicit in the sense that the whole leap-second thing is ambiguous.

Telling a significant group of data producers that they must change their software in a way that will cause widespread problems for their users is a good way to get them to ignore you.

Indeed. The question at hand is do we codify as correct what is, at best, a kludgy improper process?

As we've all said, the ambiguity is a non-issue for the vast number of CF use cases.

So the question really is -- are there folks for whom:

  • Leap-seconds matter
  • They know and understand the issues of UTC vs TAI, etc.
  • They have no way to produce a "correct" time variable

I honestly don't know -- but it's been suggested that we can use UTC always as the timestamp, 'cause anyone that understands the distinction (and cares about it) will have access to a leap-second-aware-library. In which case, no -- there isn't a large user base.

But given the dirth of leap-second aware libs, maybe we have no choice to to codify this kludgy encoding.

@martinjuckes
Copy link
Contributor

Hello Lars,

I completely agree that we should design the standard to facilitate implementation, but that is not the same as supervising or managing implementation.

The UDUNITS issue opens up a can of worms. There are 41 instances of the word "udunits" in the current conventions document, many of them erroneous because they refer to properties of a long expired version of udunits. For example, "udunits.dat" is referred to several times (it has been replaced by udunits2-accepted.xml, udunits2-base.xml, udunits2-common.xml and udunits2-derived.xml), the text I quoted above is from the old documentation, ppm is said to be absent (it is now included) and the routines utScan and utIsTime are no longer in the udunits package.

There is also a curious vagueness in the statement that "The value of the units attribute is a string that can be recognized by UNIDATA’s Udunits package [UDUNITS], with a few exceptions that are given below". I've always believed that units such as mole etc should be interpreted as defined by Udunits, which goes a bit further than just having the string recognised by the software package. Referring to Udunits for a definition of SI units seems a little odd to me: I would prefer a direct reference to SI here. However, these Udunits issues take us away from the central theme of this discussion. Would it make better sense to address then in a separate ticket?

On Milankovitch cycles: yes, that is certainly true. And also that the mean Gregorian year is not exactly equal to the mean period of the Earth's orbit around the sun, so just as solar noon will drift relative to our fixed day gregorian calendar, so will the solstices drift relative to the months in that calendar. The key thing to stress here is that we are defining a calendar system based on elapsed time, not on recurring orbital configurations.

In my proposed text on Jan 28th I suggested re-organising Karl's draft to create a new sub-section specifically for the reference time. I was hoping that this would be sufficiently clear for David and others developing software. If we do progress as far as having a pull request for a revised convention I'm sure David and others will check it carefully. Are there any specific weaknesses in these proposals that you see? Do you think we need a different approach to the text?

@larsbarring
Copy link
Contributor

Hello Martin,

  • Facilitate implementation but not supervise: of course.

  • Udunits can of worms: fine with a separate ticket, but I suggest this "renovation" of the CF text should be done before CF-1.8 goes live.

  • There are however a couple of udunits related things that belong here: At the beginning of section 4.4 Time coordinate of CF-1.7 the text reads

    The units attribute takes a string value formatted as per the recommendations in the Udunits package [UDUNITS] . The following excerpt from the Udunits documentation explains the time unit encoding by example:

    The specification:

    seconds since 1992-10-8 15:15:42.5 -6:00
    ... ...

    This should be replaced with a reference to ISO8601 and its date/time formatting. This also means changing a number of (all!) examples where a date is given to always show two digits for month, day, hour, minute and seconds.

  • And slightly further down in CF-1.7 the following line appears:

    Note: if the time zone is omitted the default is UTC, and if both time and time zone are omitted the default is 00:00:00 UTC.

    Which is confusing if the utc calendar is included, because the time zone has nothing to do with the utc calendar.
    I suggest changing this line to:
    Note: if the time zone is omitted the default is 00:00, and if both time and time zone are omitted the default is 00:00:00 00:00.
    I think this is the only place where UTC is mentioned in CF-1.7.

  • Regarding your Jan 28 suggestion:

    • I agree to leave out orbital position, Milankovitch etc. as much as possible. This is mostly something else than a pure time coordinate issue.
    • Are dates before 1972-01-01 allowed for the utccalendar? I think not, as it is difficult to precisely and uniquely fix an earlier reference time. And as I understand it, the utc calendar is intended for modern high precision data.
    • I like your suggestion to give the reference time a specific subchapter (delete the initial "If" in the suggested text?). It helps to sort out things at one place, but I have a few comments:
      • You write

        When specifying the time coordinates for current observations, the UTC calendar should, as a rule, be relied on unless:

        I suggest that this is qualified with language that explain that the utc calendar is mainly intended for observations requiring high temporal precision. Otherwise one may end up with basically consistent datasets that cannot be (directly) analysed because the calendars (utcand gregorian) are incompatible, and whatever software is ambitious and checks for consistency. Could CF specify that gregorian is compatible with utc, but the opposite is not true? i.e. if one is performing some task in gregorian "time space" it is ok to mix, but not in utc "time space". This might be very useful when stitching together data from different sources, or just drawing together data from different sources for some analysis.

      • The subheading "gregorian calendar:" should I think include the proleptic_gregorian calendar.

      • I find the explanation under the the heading "360_day and 365_day calendars" unclear for the case of the 360_day calendar (it works fine for the 365_day calendar):

        • What does the sentence "Dates in reference times should be interpreted as approximately matching dates in the gregorian calendar". How should the date 2019-02-30 in a 360_day calendar be interpreted, approximately, in the gregorian calendar -- as 2019-02-28 or as 2019-03-02?
        • And in the next sentence "Where possible, analysis should be on a monthly basis and the months of these calendars acn be treated as equivalent, for climate analysis, to the months of the gregorian calendar" it is important whether one use all 30 day month or try somehow to use the gregorian months' length.

@ChrisBarker-NOAA
Copy link
Contributor

@larsbarring wrote:

I suggest changing this line to:
Note: if the time zone is omitted the default is 00:00, and if both time and time zone are omitted the default is 00:00:00 00:00

Agreed: the term “UTC” is overloaded here, better to avoid it. However, I suggest “offset” rather than “time zone”. ISO 8601 only supports an offset.

(Yes, time zone is another overloaded term, so we should avoid it)

@martinjuckes
Copy link
Contributor

Hello Lars,

I agree that it makes sense to deal with reference time issues. It does run us into some codability issues .. but we should tackle that.

Units of Measure renovation: I agree that this needs sorting out, and should be done for CF-1.8. I'd just like to correct my earlier statement: CF units string can be either units of measure plus (for time) reference time or scaling factor (for non-dimensional quantities, e.g. 1.e-6).

Single digit dates: this appears to be a widely supported extension of ISO 8601. Conversely, many libraries don't support the shortened form specified by ISO 8601 (e.g. 19720101 for 1972-01-01). We could say the single digit dates are not allowed for the new utc calendar, but I think the desire for backward compatibility will force us to keep supporting it for other calendars. Perhaps we could also deprecate it for gregorian and proleptic_gregorian?

Julian days: ISO 8601 supports dates of the form 1972-204 meaning day 204 in 1972. UDUNITS does something weird and buggy with dates in this format. The Julian day is not mentioned in the convention .. it could be interpreted as being supported by default, but perhaps we should deprecate the use of this format in the reference time? Similarly for 1972-W10 meaning week 10.

Before 9999BC: the ISO 8601 standard supports an extension for such dates (unlike the single digit months, which is an unsupported extension) ... but support for the extension in software libraries likely to be scarce. We could be restrictive here, to simplify the coding: e.g. "Reference times prior to 9999BC can be used, but must use the strict ISO 8601 extended format for the date, of form -100000-01-01 with separator '-' and double digit months and days [ or -100000-001 if using Julian days]." It is possible construct regex expressions for (a) The core ISO 8601 date, (b) the single digit month/day extension with a separator and (c) and extension for before 9999BC as described here (see below).

UTC and accuracy: the motivation at the start of this thread certainly associated the demand for utc with accuracy, but I have introduced another use case: interoperability with people using the civil standard. A specific example would be the requirement for European institutions to use INSPIRE compliant metadata, which means using UTC time referencing (offsets are allowed, of course). This is for standardisation, not generally for accuracy. Yes, there is an issue about compatibility between UTC and our gregorian .. but it is not really a new problem as we already have issues with different spatial grids and also converting from the spherical Earth of most models to the slightly more complex geometry of the planet we live on. We say that conversion between utc and gregorian can be done as suggested in the current text, but should in general not be done if the uncertainty in the time axis is specified to have a standard_error less than 1 second (using an ancillary variable with standard name time standard_error). If people are concerned that users should respect a particular level of accuracy in the data or axes it makes sense to specify that accuracy in the CF metadata designed to quantify accuracy.

360_day: the sentence "Dates in reference times should be interpreted as approximately matching dates in the gregorian calendar" is intended to provide guidance comparable to that provided for utc to gregorian conversion. "Approximately" has to mean "to within a few days", and it is not intended to specify an algorithm. I didn't want to go into how quantities should be compared (means for some, perhaps totals for others and variances adjusted for sample size etc).

Time-zones: I agree with @ChrisBarker-NOAA that we should avoid mentioning these. The term is often used in a way which is interchangeable with "time offset", but, strictly speaking, time zones are defined by national authorities and may have complex geographical boundaries. They are generally defined as offsets from UTC. All we need to deal with is offsets.

Here are some regex expressions for the year/month/day date formats discussed above:
(a) core 8601: ^([1-9][0-9]{3})(?:(?:(-)?)(1[0-2]|0[1-9])(?:(?(2)-|)(3[01]|0[1-9]|[12][0-9]))?)?$: matches YYYY-MM-DD or YYYYMMDD, or YYYY-MM, YYYY variations (but not YYYY-MMDD);
(b) single digit months: ^([1-9][0-9]{0,3})(?:-(1[0-2]|0[1-9]|[0-9])(?:-(3[01]|0[1-9]|[12][0-9]|[0-9]))?)?$: matches Y-M-D, YYYY-MM-DD and various combinations of digit counts between these;
(c) extended years: ^([-+]?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])$: matches +/-YYYYYY-MM-DD;
(d) Julian days: ^([1-9][0-9]{3})(?:-)?([0-2][1-9]{2})$ matches YYYY-DDD or YYYYDDD;

@larsbarring
Copy link
Contributor

  • ISO8601 general: I was thinking that CF would support a subset of all the different date formats, but from your comment I get the impression that you have a more ambitious goal? I suggest that a reasonable subset should be enough (e.g. not including julian days, weekday, and other variants that are important for "social and business" communication rather than associated with specification of the reference time for some data.

  • Single digit date formats: Can CF specify that data to be produced according to the new convention rules should use the format YYYY-MM-DD. Old data produced according to old standard versions will of course still be acceptable. In particular I do not like formats like YY-MM-DD and Y-M-D. While ISO8601 do specify the order Y, M, D, there will still be uncertainty as to whether this was in fact followed or if there was a mistake somewhere. If CF specify YYYY-MM-DD almost all of this ambiguity disappears. One could of course switch MM and DD by mistake but the chance for this to happen is much smaller. If a compelling use case surfaces it is always possible to deal with that then to allow additional variants of the ISO8601 formats.

  • UTC and accuracy: Fine, you have thought this through in much more detail than I have.

  • 360_day calendar: I understand, I think, what you are aiming at. And at first sight I agree. But does the sentence "Dates in reference times should be interpreted as approximately matching dates in the gregorian calendar" mean what I think that it means; that the all 30 day months should be abandoned? I have been wrestling with this quite a few times, and sometimes it is best (or at least simple and acceptable) to keep the 30 day months, but for some applications it is better (even necessary) to use the usual gregorian month lengths and do some "magic trick" to make up for the missing days. I sense that there is a can of worms lurking here. For the 365_day calendar this is much less of a problem.

  • Time-zone: Accepted, "offset" is much better.

@martinjuckes
Copy link
Contributor

  • ISO8601: I'd be happy with a more restrictive approach going forward .. the issue is to ensure that we can handle all the use-cases which exist in our archives, whether we deprecate them or not. Incidentally, I've worked out what UDUNITS2 is doing with dates of the form "2018-105". This should be interpreted as Julian day 105, but is actually interpreted as "2018-10-01T05:00:00".

The single digit dates are required only for backward compatibility ...I agree it makes sense to deprecate them in future.

360_day calendar: I'm really not thinking of a specific approximation here. As you say, there are many different approaches, and, as far as I know, being approximate is the one thing they have in common. I'm not sure how you infer something about abandoning 30 day months, but I'm happy to drop the sentence if it causes confusion. Would it be better to say something like "Seasons may be interpreted as approximately matching seasons in the gregorian calendar"?

@larsbarring
Copy link
Contributor

360_day calendar: I was merely interpreting the phrase "...approximately matching dates in the gregorian calendar" according to wikipedia's piece on the variable length of months in the gregorian calendar. Would it be enough to write something like:

"According to this calendar a year has 12 month each having 30 days. Months and seasons may be interpreted as approximately matching the corresponding months and seasons in the gregorian calendar."

This sentence does of course not say anything about dates that are "illegal" in the gregorian calendar. There is also something to consider with respect to intensive and extensive quantities. The former are easier to transfer between calendars than the latter. But that is another thread/ticket.

@martinjuckes
Copy link
Contributor

Yes, I like that wording. It says, I believe, as much as the convention needs to say on this topic.

@ChrisBarker-NOAA
Copy link
Contributor

Maybe we could put all this discussion about string formatting of date, units, etc in a new issue???

This is getting really messy :-)

Or better yet, a PR -- so we could be clearly discussing the document text.

@martinjuckes
Copy link
Contributor

I've checked the CF Conventions and udunits more carefully, and we can probably get away with a simpler approach, as Lars suggested. Firstly, there is no indication that the ISO8601 short form ("20180101" for "2018-01-01") has ever been acceptable, so we can insists on having a "-" as a delimiter, and that will make it much easier.

I think we may have reached the point where we can move to a pull request ...

@JimBiardCics
Copy link
Contributor Author

I agree with @ChrisBarker-NOAA. Changing the formatting rules for the reference time is out of the scope of this issue. You need to get input and involvement from paleo people before you start messing with some of what you are suggesting, and from the rest of the community for others.

@JimBiardCics
Copy link
Contributor Author

I've been out sick and visiting parents, so here's a couple of observations from reading the last week of discussion.

The length of one day is currently ~2 milliseconds longer than the "defined" length of 86,400 seconds. It sounds small, but it adds up to a total elapsed time shift of 37 seconds since 1972. As @martinjuckes pointed out, the earth's rotation is also slowing, and doing so in a variable fashion. That's why we have to rely on a leap second table, as opposed to a leap second formula, and why there's no 'proleptic UTC' system.

Julian days aren't really a thing. There's Julian Date (JD), which is a count of elapsed days since an agreed epoch, which avoids the whole leap year issue, and there's day of year, which is a (usually) 1-based count of days within a year. I'd suggest we avoid both as far as this discussion is concerned.

I agree that we should minimize our references to both UTC and UDUNITS in our text. I think it's sufficient to state that the time units that can be used are limited to those defined by UDUNITS. We also need to deprecate the use of the units month and year (except for paleo years, megayears, etc?), but I think we should be able to have our entanglement with UDUNITS end there.

@JimBiardCics
Copy link
Contributor Author

I wish I could understand why I can't seem to get my point across regarding the very real ambiguities in past and present time stamps in relation to SI seconds. I feel like my position is being misrepresented again and again. At the risk of sending us off on another wild goose chase, I'm going to try one more time. Let me start by asking a question.

Do you agree that - satellite data aside - the vast majority of time observations people wish to store in netCDF files start out as time stamps? (If you disagree, please explain your reasoning.)

@ChrisBarker-NOAA
Copy link
Contributor

ChrisBarker-NOAA commented Feb 12, 2019

@JimBiardCics:

please clarify: "time observations" is. e.g. a measurement taken at a given time, as opposed to, say, model results, etc. Yes?

In that case, I'd say yes -- which is why I've been harping on the "store timestamps" use case! (more generally, a use-case focused discussion)

Not sure what this has to do with the SI seconds issue though ...

-CHB

@JimBiardCics
Copy link
Contributor Author

@ChrisBarker-NOAA Yes, let's also take model data off the table for the moment. (Although I think it may well be true in that case as well.)

@larsbarring
Copy link
Contributor

@JimBiardCics asks

Do you agree that - satellite data aside - the vast majority of time observations people wish to store in netCDF files start out as time stamps? (If you disagree, please explain your reasoning.)

and clarifies

Yes, let's also take model data off the table for the moment.

to which my answer is "Yes, I agree", and curious to see where this line of reasoning may take us.

@martinjuckes
Copy link
Contributor

Are we taking something like the wikipedia definition of `time stamp' here: i.e. "A timestamp is a sequence of characters or encoded information identifying when a certain event occurred"? In which case I suppose the answer has to be yes (also for model data), with the caveat that many observations are made at (approximately) fixed intervals and recorded in terms of a reference time and an elapsed time (usually without explicit reference to the approximations).

@martinjuckes
Copy link
Contributor

As progress here is slow, can I invite anyone who is interested in other aspects describing the time axis in CF to take a look at #166? I don't think there is any specific overlap with this discussion, but it is a related theme.

@JonathanGregory
Copy link
Contributor

I have closed this issue because it's been dormant for more than twelve months. However, since it was last discussed, changes were agreed and made in the convention to address the difficulties of leap seconds and calendars. See issues 313, 298 and 319.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agreement not to change Issue closed with agreement not to make a change to the conventions enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

No branches or pull requests