-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add calendars gregorian_tai and gregorian_utc #148
Comments
I remember when CF discussed time a few years ago. It was the longest discussion I ever followed on CF. You have addressed the main points that I remember. |
The proposal looks sensible to me. One item that might be worthwhile to discuss or mention is the way in which times should be specified when the calendar is "gregorian_tai". It seems to me that implicit in that definition is included the requirement to store time as "seconds since ..." a reference date/time. For any time unit larger than seconds it will be difficult to do conversions to seconds consistently ... unless we define "minutes" as strictly 60 seconds and so on. |
All units in time variables follow their UDUNITS definitions. According to UDUNITS, seconds are SI seconds, minutes are 60 seconds, hours are 3600 seconds, days are 86400 seconds, weeks are 604800 seconds (7 days), and fortnights are 1209600 seconds (14 days). Years are 31556925.9747 seconds (365.242198781 days). Months are a mess, with 1 month defined as being 1/12 of a year - 30.436849898 days. As CF says in section 4.4, time should not use units of months or years, as those quantities will produce unexpected results if you are not very careful with them. I see no problem in storing times in any units from yoctoseconds (1e-24 seconds) up to and including fortnights, as they are all clean and consistent multiples of SI seconds. |
If you are going to specify seconds (or yocto seconds), then is it necessary to specify what type of number (integer, real) it is, to make sure that the specified number can be large enough and precise enough to be useful? Specifically, if using some sort of integer and the number of seconds you need could exceed the maximum value for some types of integer, and if you are using real numbers there may not be enough precision to distinguish between one second and the next (when the number of seconds gets large). |
a given data file is definition is using a particular type for a variable -- so yes, the file creator needs to be thoughtful about it, but I don't think CF has to say anything about it. |
Hello Jim, thanks for directing me here from the mail list discussion on 360 day calendars. As you might guess from my latest contribution to that discussion, I have reservations about relaxing the specification of I'm not sure if I understand the comments in your proposal about non-metric time values. Take, for example, a short period spanning the last leap second, which occurred at 2016-12-31 23:59:60. As a result of this leap second being inserted Software often adopts a simplified relationship between the counters in the time stamp and the metric time. An extreme case of this is the 360 day calendar we have been discussing on the link I mention above .. in which we have 30 days to the months and 12 months to the year, so that all counter increments relate directly to specific time intervals. My understanding is that by default the time stamp (the date following All the references I've found indicate that the time elapsed in the UTC system is exactly, by definition, time elapsed as measured by the atomic clock. The only difference is that UTC includes a concept of days, hours, and minutes, and the It seems to me that the distinction is between the Julian/Gregorian calendar, in which the interval between Wouldn't it be sufficient, as far as the CF standard is concerned, to recognise that the Gregorian/Julian calendar is no longer There is a separate problem concerning the UDUNITS implementation ... but if terminology is agreed here, we could discuss that on the UDUNITS repository issues. |
@martinjuckes You are exactly correct about the proper interpretation of time around a leap second. A large number of existing observational datasets obtain time as a UTC time stamp and then convert it to an elapsed time since an epoch using "naive" software which does not take leap seconds into account. A growing number of observational datasets directly acquire precise and accurate elapsed times since an epoch, either from a GPS unit or a satellite operating system, or they acquire time stamps that don't include any leap seconds (TAI time stamps, for example) and convert them using naive software. As it currently stands, those creating the time variables have no way to indicate which way the data was acquired, and users have no way to tell how they should interpret the values. As I mentioned, the question is often unimportant because the time resolution of the data acquired is coarse enough that it doesn't matter, or because it comes from non-physical systems such as models. When it the question is important, it can be critical. The CF explanation of the calendar attribute is
We are trying to make a way for people to indicate what they have done when it matters without burdening those for whom this is not important. Here's a table showing the impact of different processing sequences with respect to leap seconds when attempting to obtain a UTC time stamp from a time variable with an accurate UTC epoch date and time (assuming there are one or more leap seconds within the time variable range or since the epoch date and time).
Naive - The conversion is unaware of leap seconds. The last two entries in the table are, in truth, equivalent to the middle two. It doesn't really matter whether you started with UTC time stamps, TAI time stamps, or GPS elapsed time counts - as long as you end up with accurate TAI-compatible elapsed time values, the conversion to correct UTC time stamps must be smart (properly handle leap seconds). I'm open to other names for the two new calendars, but I think the suggested names are reasonable. |
Lots of good points here, but I think this should be very simple:
The fact is that UTC and TIA are slightly different calendars -- as such users absolutely need to be able to specify which one is appropriate. Done. So this proposal is a great (maybe the only, really) compromise between the specification we need and backward compatibility. I am a bit confused by this, though: > the time values may not be fully metric. huh? not entirely sure what "fully metric" means, but if you have e.g.: seconds since 1980-01-01T12:00:00 then the actual values in the array should fully represent monotonically increasing values, and time[i] - time[j-i] should always represent exactly the number of seconds that has elapsed. what is non-metric about that??? BTW, this all is a really good reason to encode time in this way, rather than, say ISO datetime strings.... -CHB |
@JimBiardCics Thanks for that detailed reply. I think I understand now. I've also done a bit more background reading and learned how extensive this problem is, causing disruption to companies like Google and Amazon, who have both adopted messy (and mutually inconsistent) work-arounds to the problem. I'd like to split this into two related problems: (1) is reasonably easy to deal with (at first sight) ... we need two calendars which differ by the leap seconds, so that There also appears to be a slight inconsistency in the convention between the introduction to section 4.4, which explains how the time stamp relates to UTC time, and subsection 4.4.1 (which you quote) which states that the time stamp will be interpreted according to the calendar. Perhaps this just needs a clarification that the introductory remarks only apply for specific calendars. There is a further complication in that Udunits neglects the leap seconds, so it is not accurate at the level you want to achieve here. (2) introduces some complexity, and I think this is where the idea of a "non-metric" coordinate comes in. In an ideal world we might deal with (2) by improving the available tools, but we don't know when the next leap second is coming (it will be midnight on June 30th or December 31st .. and almost certainly not this year ... but beyond that we have to wait for a committee decision, so nothing can be programmed in advance). What I think you are suggesting, to address point (2), is that we allow a construction in which |
Here is a CDL illustration of how I think this could work, with separate variables for the (1) months and years (with each year 12 months, months variable numbers of days), (2) days, hours, minutes (60 minutes to the hour, 24 hours to the day, variable length minutes) and (3) seconds (SI units .. only counting seconds since last minute). I've not included a true time axis, as trying to convert the times to UTC times in a single parameter would be error prone and defeat the point of the example.
|
About udunits:
I'm confused: CF "punts" to Udunits for unit definitions. But there is nothing in CF that says you have to use udunits for processing your data, and I also don't see any datetime manipulations in Udunits anyway (maybe I've missed something). So what does Udunits have to do with this conversation? Udunits does make the unfortunate choice of defining "year" and "month" as time units, but I thought CF already discouraged (if not banned) their use. |
Is it? It's terribly problematic for libraries -- which is why most of them don't deal with it at all. But for CF, I'm not sure it matters: datetimes are encoded as: a_time_unit since a_datetime_stamp. TIA and UTC define seconds the same way. And no one should use a time unit that isn't clearly defined multiple of seconds. The only difference between TIA and UTC here is when you want to convert that encoding to a datetime stamp -- exactly what datetime stamp you get depends on whether you are using TIA or UTC, but the seconds are still fine. If someone encodes a datetime in the future, and specifies TIA time, then clients will not be able to properly convert it to a datetimestamp -- but that isn't a CF problem. |
Breaking these down into manageable chunks ...
Yeah, I can see that would be "non-metric" -- but why in the world would we want to allow that? and in fact, how could you allow that??? CF time encoding involves one datetimestamp, not two. It would be a really bad idea to somehow mix and match TIA and UTC in one time variable -- specify the timestamp in UTC, but that processing tools should use TIA from there on?? two minutes is never, ever, anything but 120 seconds. Let's not confuse what I am calling a timestamp -- i.e. the "label" for a point in time from a time duration. So: the duration between: 2016-12-31 23:59:00 and 2017-01-01 00:01:00 is 120 seconds in the UTC Calendar, and 121 seconds in the TIA calendar. so: 120 seconds since 2016-12-31 23:59:00 will be a different timestamp if you use the TIA calendar than if you use the UTC calendar -- just as it maybe be different if you use any other calendar.... So what is the problem here? Granted -- if you want to accurately create or use a tiem variable with the TIA calendar, you need a library that can do that -- but that is not CF's problem. |
@ChrisBarker-NOAA As @martinjuckes mentioned in his comment, I'm calling the UTC time 'non-metric' because a time variable for data sampled at a fixed rate based on UTC time stamps converted to elapsed time since an epoch without dealing with leap seconds may contain deviations from what you would expect. If you attempt to 'do math' with the contents, you may find that adding an interval to a time does not produce the time you expected and subtracting two times does not produce the interval expected. Let's say I have acquired data at regular intervals, dt, and I have captured the time by grabbing accurate UTC time stamps for each acquisition. If I construct my time variable by naively converting my acquired time stamps to elapsed time since my epoch UTC time stamp, I might have one or more problems lurking in my time variable.
The non-monotonicity problem is one that I don't even want to get into. And, again, for someone measuring things once an hour (for example) this is all pretty ignorable. |
About @martinjuckes' CDL: I am really confused as to why anyone would ever want to do that. I remember a thread a while back about encoding time as ISO 8601 strings, rather than "time_unit_ since timestamp" -- at the time, I opposed the idea, but now we have an even better reason why. If we stick to the current CF convention, then all we need to do is specify the TIA calendar as a valid calendar (and clarify UTC vs TIA) -- that's it -- nothing else needs to change, and there is no ambiguity. Is the goal here to be able to specify TIA in a way that users can use it without a TIA-aware library? I think that's simply a bad idea -- if you don't have a TIA aware library, you have no business working with TIA times (at least if you care about second-level precision). |
@JimBiardCics wrote:
Thanks, got it. My take on this -- if you do that, you have created invalid, incorrect data. CF should not encode this as a valid thing to do. As far as I'm concerned, it's the same as if you did datetime math with a broken library that didn't do leapyears correctly. And frankly, I'm not sure HOW we could accommodate it anyway -- I'm still a bit confused about exactly when leap seconds are applied to what (that is, I think *nix systems, for instance, will set the current time with leap seconds -- so the "UTC" timestamp is really "TIA as of the last time it was reset". Which I think is the concern here -- if you are collecting data, and are getting a timestamp from a machine, you don't really know which leap-seconds have been applied. But again, that's broken data.... If we were to try to accommodate this kind of broken data, I have no idea how one would do it? One of the reasons that leap seconds are not used in most time libraries is that they are not predictable. So a lib released last year may compute a different result than one released today -- how could we even encode that in CF?!?! |
@ChrisBarker-NOAA The first idea here is to allow data producers a way to clearly indicate that the values in their time variables are metric elapsed time with none of the unexpected discrepancies that I referred to earlier. Instead of The second idea here is to allow data producers to clearly indicate that the values in their time variables are non-metric elapsed time potentially containing one or more of the unexpected discrepancies that I referred to earlier, and to indicate that in all but one case they will get correct UTC time stamps if they convert the time values to time stamps using a method that is unaware of leap seconds. This result is not guaranteed if you add an offset to a time value before conversion, and differences between time values may not produce correct results. You may obtain one or more incorrect time stamps if your time values fall in a leap second. Keep in mind that the potential errors are, as of today, going to be less than or equal to 37 seconds, with many of them being on the order of 1-2 seconds. For backward compatibility, and for the vast number of datasets out there where errors of this magnitude are ignorable, the existing The use of the word metric is problematic because minds inevitably go to metric system. I have used it this way so many times when thinking about this that I forget that is is confusing. |
@martinjuckes The point here is that we have an existing way to represent time that has been used for quite a few years now. This was never a problem for climate model data or data acquired on hourly or longer time intervals. We may at some future point (CF 2.0?, CF 3.0?) want to consider some significantly different way of handling time. For CF 1.* we want to find a way to accommodate satellite and other high frequency data acquisition systems without imposing unneeded burdens on anyone. CF says that the purpose of the calendar attribute is to tell you what you need to know to convert the values in a time variable into time stamps. We aren't telling them (at least not directly) how we obtained the values in the time variable. We are telling them how to use them. @JonathanGregory, @marqh, and I came to the conclusion that, while there may be cases we didn't consider, pretty much every time variable anyone might create (within reason) would fall into one of three categories:
Time representation is a monster lurking just under the surface. Everything was fine until we looked down there. The only pure time is counts of SI seconds (or fractions thereof) since some agreed starting point. Everything else is burdened with thousands of years of history and compromise. |
Hi @JimBiardCics : I don't know where you get the idea that CF time is different from SI time : as far as I can see CF time is defined as SI time measured in SI units. Making a change to depart from SI units is a big deal. |
@ChrisBarker-NOAA <https://github.com/ChrisBarker-NOAA> The first idea here
is to allow data producers a way to clearly indicate that the values in
their time variables are metric elapsed time with none of the unexpected
discrepancies that I referred to earlier. Instead of *gregorian_tia*, we
could call the calendar *gregorian_linear* or *gregorian_metric* or some
such. We thought to reference TIA because TIA is, at base, a metric, linear
count of elapsed seconds since the TIA epoch date/time.
*gregorian_tia is fine.*
The second idea here is to allow data producers to clearly indicate that
the values in their time variables are non-metric elapsed time potentially
containing one or more of the unexpected discrepancies that I referred to
earlier,
OK, I vote to simply not allow that in CF. Those discrepancies are errors.
If second level precision is important, then don’t use a library without
that precision to write your data.
and to indicate that in all but one case they will get correct UTC time
stamps if they convert the time values to time stamps using a method that
is unaware of leap seconds.
I’m not sure we can know that for a given dataset. As leap seconds are
unpredictable, and computer clocks imprecise, the “UTC” time you get from a
system clock may or may not have had the last leap second adjustment at
just the right time. Granted, that’s only a potential second off, but still
...
If your application cares about second-level precision you should use TAI
time — isn’t that what GPSs use, for instance?
So in a CF time variable with, e.g.
seconds since a_datetime_stamp
The only thing you should need to know is whether the time stamp is UTC or
TAI. Other than that, a second is a second....
In practice, if you care about second-level precision, you really should
use a time stamp that’s close to your data anyway :-)
For backward compatibility, and for the vast number of datasets out there
where errors of this magnitude are ignorable, the existing *gregorian*
calendar (with a warning added in the Conventions section) will remain.
Agreed.
The use of the word *metric* is problematic
Well, I think I got that part anyway :-)
…-CHB
|
I just noticed these:
- The elapsed time values are fully metric and their relationship to the
UTC epoch date and time is accurate. (The *gregorian_tai* case.)
If the timestamp
Is UTC, you sure don’t want to call it TAI, do you?
- You must take leap seconds into account when converting these time
values into UTC time stamps if you want fully accuracy.
Yes, because UTC dies include leap seconds.
- The elapsed time values are almost certainly not fully metric and
their relationship to the UTC epoch date and time is probably not accurate,
but if you convert them to UTC time stamps without adding any offsets,
using a method that does not take leap seconds into account, you will get
UTC time stamps with full accuracy. (The *gregorian_utc* case.)
Then the timestamp is NOT UTC. I think this should be simply considered
incorrect, but it certainly shouldn’t be called something_utc, since it’s
not UTC.
- We don't have a clue about the metricity or accuracy of the elapsed
time values or the epoch date and time. At least not to within 37 seconds.
And we don't care. (The updated *gregorian* case.)
I think this should be what you call your “non-metric” case._tai
UTC and TAI are well defined (at least from now into the past) if we call
it something _utc it should be UTC — and if we call it something_tai it
should be TAI.
The vast majority of files in the wild are probably thought to be UTC, but
processed with non-leap second aware libraries. And as you pint out, the
vast majority of those it doesn’t matter.
I’m less optimistic than Jim that there are people writing files that are
“UTC but processed with a non-leap second aware library” that:
- Care about full precision
- Understand what they have done enough to guarantee reversibility.
I would think that folks working with data for which this matters would
have tools and libraries that do it right.
So I propose that “gregorian” mean — “ambiguous with regard to leap
seconds”, since that’s what files in the wild are.
gregorian_utc means “truly UTC, leap seconds and all”
gregorian_tai means “truly TAI — no leap seconds”
And that it really only applies to the timestamp, as the values should have
their usual definition.
…-CHB
The only pure time is counts of SI seconds (or fractions thereof) since
some agreed starting point.
Yup — which is why CF is the way it is :-)
|
Hi @ChrisBarker-NOAA : can't we keep the de-fact definition of days in the Gregorian calendar being exactly 86400 seconds? Precision in the standard is a good thing. Clearly there will be less precision in the time values entered in many cases, but that is up to the users who will, hopefully, provide the level of accuracy required by their application. This would make You ask why it matters to the convention that the UTC calendar is not defined into the future past the next potential leap second: this is an issue of reproducibility and consistency. The changes are small, but we are only introducing this for people who care about small changes. The ambiguity only applies to the time-stamp: the interpretation of the statement I agree with Chris that we can't hope to retro-fix problems which may have occurred due to people putting in information inaccurately. What we need is a clear system which allows people to enter information accurately. |
Hi All,
I would like to support Chris Barker’s approach.
A Gregorian day, and calendar, as defined by BIPM (Bureau International des Poids et Mésures), IERS (International Earth Rotation Service) and as notated in ISO8601, may have leap seconds, and must have them if they are declared, by definition. This will stay this way, even if the ITU WRC (International Telcommunications Union, World Radio-communcation Conference) does succeed in passing a motion abolishing leap seconds in the next decade or so.
A de facto definition of 86400 (SI) seconds in a Gregorian day is wrong and always has been.
The past ambiguous labelling of many NetCDF datsets as Gregorian is unfortunate and not easily fixed.
Chris B’s proposal is minimal, (and therefore to be supported!) with three labels:
1. Ambiguous, status quo;
2. Proper Gregorian calendar, with leap seconds;
3. Proper International Atomic Timescale, without leap seconds, leap days, months, etc.
I also support at least one further, separate, label for a 360-day year.
I do not have strong views on what the actual labels should be.
Chris
PS Apologies for not delving into GitHub.
From: [email protected] <[email protected]> On Behalf Of Martin
Sent: 31 October 2018 11:05
To: cf-convention/cf-conventions <[email protected]>
Cc: Subscribed <[email protected]>
Subject: Re: [cf-convention/cf-conventions] Add calendars gregorian_tai and gregorian_utc (#148)
Hi @ChrisBarker-NOAA<https://github.com/ChrisBarker-NOAA> : can't we keep the de-fact definition of days in the Gregorian calendar being exactly 86400 seconds? Precision in the standard is a good thing. Clearly there will be less precision in the time values entered in many cases, but that is up to the users who will, hopefully, provide the level of accuracy required by their application. This would make gregorian equivalent to gregorian_tai.
You ask why it matters to the convention that the UTC calendar is not defined into the future past the next potential leap second: this is an issue of reproducibility and consistency. The changes are small, but we are only introducing this for people who care about small changes. The ambiguity only applies to the time-stamp: the interpretation of the statement x seconds since 2018-06-01 12:00:00Z will not be affected by future leap second, but the meaning of y seconds since 2020-06-01 12:00:00Z may change if a leap second is introduced next year. One solution would be to disallow the use of a time-stamp after the next potential leap second.
I agree with Chris that we can't hope to retro-fix problems which may have occurred due to people putting in information inaccurately. What we need is a clear system which allows people to enter information accurately.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#148 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AfI2gTx-0s4udX8sbpG_XaYHfIv9OMT2ks5uqYPlgaJpZM4X8l6W>.
|
@ChrisBarker-NOAA @martinjuckes There is nothing at all wrong with specifying that the epoch time stamp in the I am explicitly assuming that all UTC time stamps are correct and accurate at the times they were acquired or constructed. If you are getting your time stamp from a PC that isn't actively synced by a time server, you shouldn't bother to use either of these new calendars. When you read time out of a GPS unit, you can get a count of seconds since the GPS epoch, and I believe you can get a GPS time stamp that doesn't contain leap seconds (like TAI time stamps, but with a fixed offset from TAI), but most people get a UTC time stamp. The GPS messages carry the current leap second count and receivers apply it by default when generating time stamps. That's something I learned about from Aaron Sweeney a year or two ago - well after the big discussion we had about all of this a few years back. There are quite a lot of high-precision data acquisition platforms out there that start with accurate UTC time stamps obtained from GPS receivers. Many of them don't care about metricity. They just want their time stamps, but CF tells them that they must store time as elapsed time since an epoch. There's not really such thing as an elapsed time that is UTC vs TAI. At core, true elapsed time - a count of SI seconds since an event - is the same for both. The UTC time stamp for that event may not be identical to the TAI time stamp for that same event, but they both reference the same event, and the elapsed time since that event is the same no matter which time system you are using. The UTC time system provides a prescription in terms of leap seconds for how to keep UTC time stamps synchronized with the rotation of the earth. Just like the Gregorian calendar system provides a prescription in terms of leap days for how to keep date stamps synchronized with the orbit of the earth. The only difference - and it is an admittedly important one - is that the UTC leap second prescription does not follow a fixed formula like the Gregorian leap day prescription does. The UTC time system declares that the time stamp Time stamps - whether UTC, TAI, GPS, or something else - are, in essence, a convenience for humans. No matter what time system or calendar system you use, the number of seconds or days that has elapsed since any given date and time is the same. In a perfect world, all data producers would use a leap-second-aware function to turn their lists of time stamps into elapsed time values and all time variables would be "perfect". That would also force all users of those time variables to use a leap-second-aware function to turn the elapsed time values into time stamps. But that's not the world we live in. Does naive conversion of UTC time stamps into elapsed times have the potential to produce non-monotonic time coordinate variables that violate the CF conventions? Yes. Does it cause any real problems (for the vast majority of cases and instances of time) if people use this "broken" method for encoding and decoding their time stamps? No. At the end of the day, it doesn't matter much what process you used to create your elapsed time values. For the small number cases where differences of 1 - 37 seconds matter, we are trying to make a way for data producers to signal to data users how they should handle the values in their time variables while staying within the existing CF time framework and acknowledging CF and world history regarding the way we deal with time (which isn't very consistent and is composed of successive corrective overlays over centuries). |
Hello @JimBiardCics , thanks again for a detailed response. I support the aim of allowing users to place either a UTC or TAI time stamp in the units statement (so that they can do whatever fits with their existing processes) and making it possible for them to declare which they are using. The suggestion of using the calendar attribute for this makes sense. I think we are agreed now that there is a unique and well defined mapping between these time stamps, and there is a unique and well defined way of calculating an elapsed time (in the SI sense) between any two such time stamps. I don't see how the layers of complexity need to come into this. The TAI time stamp counts up with 86400 seconds per TAI day, while the UTC has a known selection of days with an extra second in the final minute. All we can do is define these things clearly, we can't force users to adopt best practice. As you say, some people don't have accurate clocks, just as some record temperature without accurate thermometers. I would disagree with you on one point: a non-monotonic data array is no problem, but a non-monotonic coordinate array is a problem. As Chris commented, people who need to know about this sort of problem are likely to have sorted it out before they get around to writing NetCDF files. |
@JimBiardCics wrote:
OK, though a bit hard to talk about :-)
almost -- I think TIA time is also a perfectly valid system for a "perfect" world :-) But yeah, most datetime libraries do not handle leap seconds, which, kida ironically, means that folks are using TIA time even if they think they are using UTC :-)
I'm not so sure -- I think having a time axis that is "non-metric" as you call it can be a real problem. Yes, it could potentially be turned into a series of correct UTC timestamps by reversing the same incorrect math used to produce it, but many use cases are working the the time access in time units (seconds, hours, etc), and need it to be nifty things like monotonic and differentiable, etc.
Fair enough -- a worthy goal.
I disagree here -- the truth is that TAI is EASIER to deal with -- most datetime libraries handle it just fine in fact, it is all they handle correctly. So I think a calendar that is explicitly TAI is a good idea. I think we are converging on a few decisions:
Note that the only actual difference between (2) and (3) is that the timestamp is in UTC or TAI, which are different since some time in 1958, but up to 37 seconds. In either case, the values themselves would be "proper", and you could compute differences, etc easily and correctly.
Elapsed time since a UTC timestamp, but with elapsed time computed from a correct-with-regard-to-leapseconds UTC time stamp with a library that does not account for leap seconds. This would mean that the values themselves may not be "metric". I think this is what Jim is proposing. (by the way, times into the future (when leap-seconds are an unknown) as of the creation of the file should be disallowed) Arguments for (Jim, you can add here :-) )
Arguments against:
December 31, 2016 at 23:59:60 UTC And some time libraries do not allow that. Example python's datetime:
Given these trade-offs, I think CF should not support this -- but if others feel differently, fine -- but do not call it "UTC" or "TAI"! -- and document it carefully!. That last point is key -- this entire use-case is predicated on the idea that folks are working with full-on-proper-leap-second-aware UTC timestamps, but processing them with a non-leap-second-aware library -- and that this is a fully definable and reversible process. But at least with one commonly used datetime library (Python's built-in datetime). It simply will not work for every single case -- it will work for almost every case, so someone could process this data for years and never notice, but it's not actually correct! In fact, I suspect most computer systems can't handle: December 31, 2016 at 23:59:60 UTC, and will never give you that value -- rather, (IIUC) they accommodate leap seconds by resetting the internal clock so that "seconds since the epoch" gives the correct UTC time when computed without leap seconds. But that reset happens at best one second too late (so that you won't get that invalid timestamp). All this leads me to believe that if anyone really cares about sub-second-level precision over a period of years, then they really, really should be using TAI, and if they THINK they are getting one-seconds precision, they probably aren't, or have hidden bugs waiting to happen. I don't think we should support that in CF. Final point:
OK -- but I suspect that yes, most people get a UTC timestamp, and most people don't understand the difference, and most people don't care about second-level accuracy over years. The over years part is because if you have, say, a GPS track you are trying to encode in CF, you should use a reference timestamp that is close to your data -- maybe the start of the day you took the track. So unless you happen to be collecting data when a leap second occurs, there will be no problem. For those few people that really do care about utmost precision -- they should use the TAI timestamp from their GPS -- and if it's a consumer-grade GPS that doesn't provide that -- they should get a new GPS! It's probably easier to figure out how to get TAI time from a GPS than it is to find a leap-second-aware time library :-) Side note: Is anyone aware of a proper leap-second aware time library?? Sorry for the really long note -- but I do think we are converging here, and I won't put up a stink if folks want to add the sort-of-utc calendar -- as long as it's well named and well documented. -CHB |
@martinjuckes I think it is a bad idea to allow a TAI time stamp to be placed in the units attribute as the epoch for the elapsed times in the time variable. It adds nothing but confusion. The epoch should always be specified with a UTC time stamp. This causes no problems whatsoever. Someone going to the trouble of getting exact and correct elapsed times in their time variables will have no trouble figuring out the proper UTC time stamp for their epoch. Users, on the other hand, would be faced with figuring out which way to handle the epoch time stamp. If the user is aware of the implications of having exact and correct elapsed times in a time variable, they will also have the software on hand that is needed if they want to get time stamps from the time variable, or they will get it because it is important to them. If they aren't aware, a TAI epoch time stamp will maximize the error they will get if they perform a leap-second-naive conversion to get time stamps from the time variable contents and mistakenly think are getting a UTC time stamp. When I was talking about monotonicity I was intending it in reference to coordinate variables. I disagree with Chris (whether that is @ChrisBarker-NOAA or the unknown Chris who commented through the CF Metadata List account). People who are converting precise and accurate UTC time stamps into values in time variables using the tools most available to software developers for handling time without sensitivity to leap seconds are creating time variables that have the potential to be non-monotonic because of leap seconds. They have done it in the past, they are doing it now, and they will very likely continue to do so. It may be that they avoid the problem by breaking their data into files on UTC day boundaries (or hourly boundaries, etc), but if you aggregate those files you will create non-monotonic time variables. They will continue to do so because the potential one second per year non-monotonicity is less of a problem than forcing all their users to start decoding time variables into time stamps with leap-second-aware functions that are not readily available. As I said before, this is a compromise position. Telling a significant group of data producers that they must change their software in a way that will cause widespread problems for their users is a good way to get them to ignore you. |
On another front, I'm perfectly happy to entertain different names for the new calendars. I'm not a particular fan of |
Agreed -- and we need to accommodate that -- which we already do implicitly with "gregorian", and we should make it explicit in the sense that the whole leap-second thing is ambiguous.
Indeed. The question at hand is do we codify as correct what is, at best, a kludgy improper process? As we've all said, the ambiguity is a non-issue for the vast number of CF use cases. So the question really is -- are there folks for whom:
I honestly don't know -- but it's been suggested that we can use UTC always as the timestamp, 'cause anyone that understands the distinction (and cares about it) will have access to a leap-second-aware-library. In which case, no -- there isn't a large user base. But given the dirth of leap-second aware libs, maybe we have no choice to to codify this kludgy encoding. |
Hello Lars, I completely agree that we should design the standard to facilitate implementation, but that is not the same as supervising or managing implementation. The UDUNITS issue opens up a can of worms. There are 41 instances of the word "udunits" in the current conventions document, many of them erroneous because they refer to properties of a long expired version of udunits. For example, "udunits.dat" is referred to several times (it has been replaced by udunits2-accepted.xml, udunits2-base.xml, udunits2-common.xml and udunits2-derived.xml), the text I quoted above is from the old documentation, There is also a curious vagueness in the statement that "The value of the units attribute is a string that can be recognized by UNIDATA’s Udunits package [UDUNITS], with a few exceptions that are given below". I've always believed that units such as On Milankovitch cycles: yes, that is certainly true. And also that the mean Gregorian year is not exactly equal to the mean period of the Earth's orbit around the sun, so just as solar noon will drift relative to our fixed day In my proposed text on Jan 28th I suggested re-organising Karl's draft to create a new sub-section specifically for the |
Hello Martin,
|
@larsbarring wrote:
Agreed: the term “UTC” is overloaded here, better to avoid it. However, I suggest “offset” rather than “time zone”. ISO 8601 only supports an offset. (Yes, time zone is another overloaded term, so we should avoid it) |
Hello Lars, I agree that it makes sense to deal with reference time issues. It does run us into some codability issues .. but we should tackle that. Units of Measure renovation: I agree that this needs sorting out, and should be done for CF-1.8. I'd just like to correct my earlier statement: CF units string can be either Single digit dates: this appears to be a widely supported extension of ISO 8601. Conversely, many libraries don't support the shortened form specified by ISO 8601 (e.g. Julian days: ISO 8601 supports dates of the form Before 9999BC: the ISO 8601 standard supports an extension for such dates (unlike the single digit months, which is an unsupported extension) ... but support for the extension in software libraries likely to be scarce. We could be restrictive here, to simplify the coding: e.g. "Reference times prior to 9999BC can be used, but must use the strict ISO 8601 extended format for the date, of form UTC and accuracy: the motivation at the start of this thread certainly associated the demand for 360_day: the sentence "Dates in reference times should be interpreted as approximately matching dates in the gregorian calendar" is intended to provide guidance comparable to that provided for Time-zones: I agree with @ChrisBarker-NOAA that we should avoid mentioning these. The term is often used in a way which is interchangeable with "time offset", but, strictly speaking, time zones are defined by national authorities and may have complex geographical boundaries. They are generally defined as offsets from UTC. All we need to deal with is offsets. Here are some regex expressions for the year/month/day date formats discussed above: |
|
The single digit dates are required only for backward compatibility ...I agree it makes sense to deprecate them in future. 360_day calendar: I'm really not thinking of a specific approximation here. As you say, there are many different approaches, and, as far as I know, being approximate is the one thing they have in common. I'm not sure how you infer something about abandoning 30 day months, but I'm happy to drop the sentence if it causes confusion. Would it be better to say something like "Seasons may be interpreted as approximately matching seasons in the gregorian calendar"? |
360_day calendar: I was merely interpreting the phrase "...approximately matching dates in the "According to this calendar a year has 12 month each having 30 days. Months and seasons may be interpreted as approximately matching the corresponding months and seasons in the gregorian calendar." This sentence does of course not say anything about dates that are "illegal" in the gregorian calendar. There is also something to consider with respect to intensive and extensive quantities. The former are easier to transfer between calendars than the latter. But that is another thread/ticket. |
Yes, I like that wording. It says, I believe, as much as the convention needs to say on this topic. |
Maybe we could put all this discussion about string formatting of date, units, etc in a new issue??? This is getting really messy :-) Or better yet, a PR -- so we could be clearly discussing the document text. |
I've checked the CF Conventions and udunits more carefully, and we can probably get away with a simpler approach, as Lars suggested. Firstly, there is no indication that the ISO8601 short form ("20180101" for "2018-01-01") has ever been acceptable, so we can insists on having a "-" as a delimiter, and that will make it much easier. I think we may have reached the point where we can move to a pull request ... |
I agree with @ChrisBarker-NOAA. Changing the formatting rules for the reference time is out of the scope of this issue. You need to get input and involvement from paleo people before you start messing with some of what you are suggesting, and from the rest of the community for others. |
I've been out sick and visiting parents, so here's a couple of observations from reading the last week of discussion. The length of one day is currently ~2 milliseconds longer than the "defined" length of 86,400 seconds. It sounds small, but it adds up to a total elapsed time shift of 37 seconds since 1972. As @martinjuckes pointed out, the earth's rotation is also slowing, and doing so in a variable fashion. That's why we have to rely on a leap second table, as opposed to a leap second formula, and why there's no 'proleptic UTC' system. Julian days aren't really a thing. There's Julian Date (JD), which is a count of elapsed days since an agreed epoch, which avoids the whole leap year issue, and there's day of year, which is a (usually) 1-based count of days within a year. I'd suggest we avoid both as far as this discussion is concerned. I agree that we should minimize our references to both UTC and UDUNITS in our text. I think it's sufficient to state that the time units that can be used are limited to those defined by UDUNITS. We also need to deprecate the use of the units month and year (except for paleo years, megayears, etc?), but I think we should be able to have our entanglement with UDUNITS end there. |
I wish I could understand why I can't seem to get my point across regarding the very real ambiguities in past and present time stamps in relation to SI seconds. I feel like my position is being misrepresented again and again. At the risk of sending us off on another wild goose chase, I'm going to try one more time. Let me start by asking a question. Do you agree that - satellite data aside - the vast majority of time observations people wish to store in netCDF files start out as time stamps? (If you disagree, please explain your reasoning.) |
please clarify: "time observations" is. e.g. a measurement taken at a given time, as opposed to, say, model results, etc. Yes? In that case, I'd say yes -- which is why I've been harping on the "store timestamps" use case! (more generally, a use-case focused discussion) Not sure what this has to do with the SI seconds issue though ... -CHB |
@ChrisBarker-NOAA Yes, let's also take model data off the table for the moment. (Although I think it may well be true in that case as well.) |
and clarifies
to which my answer is "Yes, I agree", and curious to see where this line of reasoning may take us. |
Are we taking something like the wikipedia definition of `time stamp' here: i.e. "A timestamp is a sequence of characters or encoded information identifying when a certain event occurred"? In which case I suppose the answer has to be yes (also for model data), with the caveat that many observations are made at (approximately) fixed intervals and recorded in terms of a reference time and an elapsed time (usually without explicit reference to the approximations). |
As progress here is slow, can I invite anyone who is interested in other aspects describing the time axis in CF to take a look at #166? I don't think there is any specific overlap with this discussion, but it is a related theme. |
Introduction
The current CF time system does not address the presence or absence of leap seconds in data with a standard name of
time
. This is not an issue for model runs or data with time resolutions on the order of hours, days, etc, but it can be an issue for modern satellite swath data and other systems with time resolutions of tens of seconds or finer.I have written a background section for this proposal, but I have put it at the end so that people don't have to scroll through it in order to get to proposal itself. If something about the proposal seems unclear, I hope the background will help resolve your question.
Proposal
After past discussions with @JonathanGregory and again with he and @marqh at the 2018 CF Workshop, I propose the new calendars listed below and a change to existing calendar definitions.
gregorian_tai
- When this calendar is called out, the epoch date and time stated in theunits
attribute are required to be Coordinated Universal Time (UTC) and the time values in the variable are required to be fully metric, representing the the advance in International Atomic Time (TAI) since that epoch. Conversion of a time value in the variable to a UTC date and time must account for any leap seconds between the epoch date and the time being converted.gregorian_utc
- When this calendar is called out, the epoch date and time stated in theunits
attribute are required to be in UTC and the time values in the variable are assumed to be conversions from UTC dates and times that did not account for leap seconds. As a consequence, the time values may not be fully metric. Conversion of a time value in the variable to a UTC date and time must not use leap seconds.gregorian
- When this calendar is called out, the epoch date stated in theunits
attribute is required to be in mixed Gregorian/Julian form. The epoch date and time have an unknown relationship to UTC. The time values in the variable may not be fully metric, and conversion of a time value in the variable to a date and time produces results of unknown precision.the others
- The other calendars all have an unknown relationship to UTC, similar to thegregorian
calendar above.The large majority of existing files (past and future) are based on artificial model time or don't need to record time precisely enough to require either of the new calendars (
gregorian_tai
orgregorian_utc
). The modified definition of thegregorian
calendar won't pose any problem for them. For users that know exactly how they obtained their times and how they processed them to get time values in a variable, the two new calendars allow them to tell users how to handle (and not handle) those time values.Once we come to an agreement on the proposal, we can work out wording Section 4.4 to reflect these new/changed calendar definitions.
Background
There are three parts to the way people deal with time. The first part is the counting of the passing of time, the second part is the representation of time for human consumption, and the third is the relationship between the representation of time and the orbital and rotational cycles of the earth. This won't be a deep discussion, but I want to define a few terms here in the hopes that it will help make things clearer. For gory details, please feel free to consult Google and visit places such as the NIST and US Naval Observatory websites. I'm glossing over some things here, and many of my definitions are not precise. My goal is to provide a common framework for thinking about the proposal, as opposed to writing a textbook on the topic.
The first part is the simplest. This is time as a scalar quantity that grows at a fixed rate. This, precisely measured, is what people refer to as 'atomic time' - a count of cycles of an oscillator tuned to resonate with an electron level transition in a sample of super-cooled atoms. The international standard atomic time is known as International Atomic Time (TAI). So time in this sense is a counter that advances by one every SI second. (For simplicity, I am going to speak in terms of counts of seconds throughout this proposal.) No matter how you may represent time, whether with or without leap days or seconds, this time marches on at a fixed pace. This time is metric. You can do math operations on pairs or other groups of these times and get consistently correct results. In the rest of this proposal I'm going to refer to this kind of time as 'metric time'.
The second part, the representation of time, is all about how we break time up into minutes, hours, days, months, and years. Astronomy, culture, and history have all affected the way we represent time. When we display a time as YYYY-MM-DD HH:MM:SS, we are representing a point in time with a label. In the rest of this proposal I'm going to refer to this labeling of a point in time as a time stamp.
The third part, the synchronization of time stamps with the cycles of the planet, is where calendars come into play, and this is where things get ugly. Reaching way back in time, there were three basic units for time - the solar year, the lunar month, and the solar day. Unfortunately, these three units of time are not compatible with each other or with counts of seconds. A solar day is not (despite our definitions) an integer number of seconds in length, a lunar month is not an integer number of solar days (and we pretty much abandoned them in Western culture), and a solar year is not an integer number of solar days or lunar months in length. If you attempt to count time by incrementing a time stamp like an odometer - having a given element increment once each time the element below it has 'rolled over', you find that the time stamps pretty quickly get out of synchronization with the sun and the seasons.
The first attempts to address this asynchrony were leap days. The Julian calendar specified that every four years February would wait an extra day to roll over to March. The Gregorian calendar addressed a remaining asynchrony by specifying that this only happens on the last year of a century (when it normally would) every fourth century. That was close enough for the technology of those days. Clocks weren't accurate enough at counting seconds to worry about anything else. But the addition of leap days (as well as months with random lengths) means that time stamps aren't metric. You can't do straightforward math with them.
In more recent times technology and science have advanced to the point that we can count seconds quite accurately, and we found that keeping the time stamp hours, minutes, and seconds sufficiently aligned with the rising of the sun each day requires the addition (or subtraction) of leap seconds. On an irregular, potentially bi-yearly, basis, the last minute of a day is allowed to run to 60 before rolling over instead of 59 (or rolls over after 58, though it's lately been only additions). Coordinated Universal Time (UTC) is the standard for time stamps that include both leap days and leap seconds.
UTC time stamps represent the time in a human-readable form that is precise and synchronized with the cycles of the earth. But they aren't metric. It's not hard to deal with the leap days part because they follow a fixed pattern. But the leap seconds don't. If you try to calculate the interval between 2018-01-01 00:00:00 and 1972-01-01 00:00:00 without consulting a table of leap seconds and when they were applied, you will have a difference of 27 seconds between the time you get from your calculation and the time has actually elapsed between those two time stamps. This isn't enough of a discrepancy to worry about for readings from rain gauges or measurements of daily average temperature, but an error of even one second can make a big difference for data from a polar-orbiting satellite moving at a rate of 7 km/second.
The clocks in our computers can add further complexity to measuring time. The vast majority of computers don't handle leap seconds. We typically attempt to address this by using time servers to keep our computer clocks synchronized, but this is done by altering the metric time count in the computer rather than modifying the time stamps by updating a table of leap seconds.
Furthermore, most computer software doesn't have 'leap second aware' libraries. When you take a perfectly exact UTC time stamp (perhaps taken from a GPS unit) and convert it to a count of seconds since an epoch using a time calculation function in your software, you are highly likely to have introduced an error of however many leap seconds that have been added between your epoch and the time represented by the time stamp.
As a result of all this, many of the times written in netCDF files are not metric times, and there is no good way to know how to produce accurate time stamps from them. They may be perfectly metric within a given file or dataset, they may include skips or repeats, or they may harbor non-linearities where there are one or more leap seconds between two time values.
We have another minor issue for times prior to 1972-01-01. There's not much way to relate times prior to that epoch to times since - not to the tens of seconds or better level. I'd be surprised if this would ever be a significant problem in our domain.
To summarize, we have TAI, which is precise metric time. We have UTC, which is a precise, non-metric sequence of time stamps that are tied to TAI, and we have a whole host of ways that counts time since epoch stored in netCDF files can be inaccurate to a level as high as 37 seconds (the current leap seconds offset between TAI and UTC).
Most uses of time in netCDF aren't concerned with this level of accuracy, but for those that are, it can be critical.
The text was updated successfully, but these errors were encountered: