-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for years with 5 digits #648
Comments
Ok that might be more complicated than I thought first. With 6 digits it is unclear if it is years or months. |
@wachsylon interesting suggestion, I wonder if @taylor13 has some insights about how this has all been dealt with within PMIP (and the ISMIP6 experiment offshoots as noted below) which considers time periods very long timescales not well represented in modern calendars. I just took a peek, and "ism-lig127k-std":{
"activity_id":[
"ISMIP6"
],
"additional_allowed_model_components":[
""
],
"description":"Last interglacial simulation of ice sheet evolution driven by PMIP lig127k",
"end_year":"",
"experiment":"offline ice sheet forced by ISMIP6-specified AGCM last interglacial output",
"experiment_id":"ism-lig127k-std",
"min_number_yrs_per_sim":"20000",
"parent_activity_id":[
"no parent"
],
"parent_experiment_id":[
"no parent"
],
"required_model_components":[
"ISM"
],
"start_year":"",
"sub_experiment_id":[
"none"
],
"tier":"3"
}, It's worth pulling @jypeter into this discussion too |
The CMIP6 specifications for the "time_range" appearing in the filenames are:
So as @wachsylon has noted, if we allow 6 digits for year, unambiguous interpretation of the date is impossible without also determining the frequency. Since all current options have an even number of digits for the dates, we could allow year to be either 4 or 5 digits without knowledge of the frequency. The template would become Is that a good idea? I don't think modifying CMOR would be a problem, but folks trying to parse the date with a 5-digit year might have problems. Does anyone (@durack1 @mauzey1 @matthew-mizielinski @jypeter @mjuckes @davidhassell @martinjuckes) know of any CMIP infrastructure software that parses the dates in the CMIP6 file names? |
@MartinaSt pinging you here |
If we allow [Y]YYYY, that would include allowing different amount of digits within atomic datasets. E.g. starting from 0001 up to 99 999 would look awkward however I cannot think of an issue any software would have. As another example, For ism-lig127k-std, it could be that the request only includes yearly frequencies so that there will be no ambiguities for that experiment. I learned that in our paleo project PalMod2, we have experiments going beyond 100 k AND monthly frequency output to be published. A solution might be to use |
Never mind! Even daily output is requested :) |
For this edge case I don't have a big problem with extending the format to allow for one extra digit to cover years 10k-99k, but as Karl notes going to a 6 digit year will make interpretation of the date numbering with the current naming scheme tricky. We need to have a think about whether there are some sensible tweaks to the naming convention we use for the future to explicitly include frequency, without introducing too much in the way of disruption for users. I wouldn't be surprised if some downstream tools will struggle to interpret the new date strings as and when they come across data formatted in this way, but as noted above this is the only experiment within CMIP6 that has this extent. As an experiment I've just run a test and have managed to produce a file for an existing CMIP6 simulation with a 5 digit year; The next question I would pose would be whether the ESGF publisher and associated systems will be happy with this (@sashakames -- any thoughts). |
One clarification. I wrote the template as [Y]YYYY because we want to make it to be generally backward compatible. For runs that might be expected to have values larger than 9999, we might recommend or insist that all 5 digits be included for all years, so, for example, "02022", not "2022" would designate this year in such runs. |
Haven't thought this through, but the time format could be tweaked from |
Yes, for a future DRS version, we could alter it as you suggest (although the hyphen separating the two dates would be more difficult to identify; I guess you could require the year to be at least 3 digits and search for the hyphen that precedes a string segment with more than 2 characters and no hyphen, but that is a bit complicated). The new template would not be backward compatible with the current DRS, so probably not a good option for immediate adoption. |
If we take this route we could go with a double dash as the separator; e.g. |
Just thinking aloud; to have 5 digits used for years within an experiment we'd need to have the start / end dates or number of years included in the *A suitably fast model and commitment from the scientists running it would be required. |
Exactly, I don't see a path forward that doesn't break the existing YYYYMMDD DRS-defined format that is expected by CMIP6, but maybe I am missing something? |
As far as publishing, the first concern is ensuring that Python can parse the "days since YYYY[Y]-MM-DD" We have been tripped up by several atypically formatted years with preceding 0's. The second is whether python timedelta supports such long year intervals in order to give the full range. I'm not sure to what extent those are tested. To clarify, publishing is unaffected by the file naming scheme. |
You raise a good (different) point. If the usual python codes can't handle the "units" attribute when year exceeds "9999", or if it can't calculate elapsed time for those units, we're in trouble. Anyone know on limitations of cdtime and similar modules? |
@sashakames that was where my mind had started to wander too, within CF there are no examples that default from the They also include a paleoclimate calendar, which is: double time(time) ;
time:long_name = "time" ;
time:units = "days since 1-1-1 0:0:0" ;
time:calendar = "126 kyr B.P." ;
time:month_lengths = 34, 31, 32, 30, 29, 27, 28, 28, 28, 32, 32, 34 ; Details are from https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch04s04.html. I agree that testing whatever we work toward through software packages is a key test. |
Ok and that answers that: In [5]: import cdtime
In [6]: cdtime.relativetime(31,"".join(['days since 10000-01-01 0:0:0.0']))
Out[6]: 31.000000 days since 10000-01-01 0:0:0.0
In [7]: cdtime.relativetime(31,"".join(['days since 100000-01-01 0:0:0.0']))
Out[7]: 31.000000 days since 100000-01-01 0:0:0.0
In [8]: a = cdtime.relativetime(31,"".join(['days since 100000-01-01 0:0:0.0']))
In [9]: a
Out[9]: 31.000000 days since 100000-01-01 0:0:0.0
In [10]: a.torel('days since 1-1-1')
Out[10]: 36523917.000000 days since 1-1-1
In [11]: a.torel('days since 1-1-1 12:12:12.5 -8.0')
Out[11]: 36523916.491522 days since 1-1-1 12:12:12.5 -8.0 Looks like |
@durack1 Good to know cdtime appears rather flexible, so a potential solution if problems with timedelta. |
It would be useful to pick up this thread with the experience that @tomvothecoder and @pochedls have been generating using |
Hi,
I only have experience with cftime, which certainly handles years with > 4
digits, but can only parse ISO 8601-style dates, e.g. with hyphen
separators between the year, month and day.
>> import cftime
>> t1 = cftime._dateparse('days since 123456*7-12*-01 12:00', 'Gregorian')
>> t2 = cftime._dateparse('days since 123456*8-01*-01 12:00', 'Gregorian')
>> t1
cftime.datetime(123456*7, 12*, 1, 12, 0, 0, 0, calendar='standard',
has_year_zero=False)
>> t2
cftime.datetime(123456*8, 1*, 1, 12, 0, 0, 0, calendar='standard',
has_year_zero=False)
>> t2 - t1
datetime.timedelta(days=31)
I presume that moving to ISO 8601-style dates would be too harmful to
backwards compatibility?
…On Wed, 17 Aug 2022 at 02:17, Paul J. Durack ***@***.***> wrote:
It would be useful to pick up this thread with the experience that
@tomvothecoder <https://github.com/tomvothecoder> and @pochedls
<https://github.com/pochedls> have been generating using xcdat with cftime
—
Reply to this email directly, view it on GitHub
<#648 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB6AA4B6NSAHWJO67VM4HD3VZQ4SDANCNFSM5OD2C5UA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
David Hassell
National Centre for Atmospheric Science
Department of Meteorology, University of Reading,
Earley Gate, PO Box 243, Reading RG6 6BB
http://www.met.reading.ac.uk/
|
For paleoclimate simulations (or simulations initiated in very early historic time -- sometime Before the Common Era), A negative year might appear (although this would rule out use of both the the "standard" and "julian" calendars). Perhaps we should think about how that would be handled too. Perhaps insert a special character before the year? (e.g., "B" for BCE, or "M" for minus, or "N" for negative) |
I've just marked this as a CMOR 4.0 item, as it would be great to catch this and other tweaks as we spec out a next-gen roadmap |
As I read the above, we haven't really come to a consensus on how to proceed with this. |
Hi,
for paleo simulations, we have simulation runs which go beyond 10k years. CMOR only writes 4 digits for the years which may leads to parsing problems when the simulation time goes beyond 10000 years.
Maybe, CMOR could support a parameter 'DIGITS_YEARS'?
Best,
Fabi
The text was updated successfully, but these errors were encountered: