-
Notifications
You must be signed in to change notification settings - Fork 18
Proposed MAPL3 History Format
- field splitting, define behavior and syntax, (sidebar, do we allow crazy things like arithmetic on split fields etc. someone asked...). Could we make field splitting just a use case of subsetting?
- File duration, time definition, and related time issues
- Does template and writing time, and only those 2 things define what file is written to when time to write, no separate duration keyword in other words? What time index in that file is another matter, that depends on the frequency and reference time...
- Is time unlimited or fixed?
- Do we allowing appending to files from previous executions if indicated by template and file existence? Maybe depends on if if time is unlimited or fixed? This has many possibilities for problems if not careful...
- Do we make time index consistent for "missing" times that weren't written based on the template and application starting time (current History we do not), i.e if the template is a something_%y4%m2%d2.nc4, instantaneous output, we have a frequency of 6 hours relative to 21z, and we start the application at 18z, and the first time is written is 21z do we write to time index 4 or 1 and is the start time of the file 3z or 21z, as there are "naturally" 4 times per file starting at 3z, but for this first time the file didn't exist it really is the first time we write to the file? Once again in the old History we would write to time index 1, with a start time of 21z. The question is do we want to enforce consistency across ALL files, even if it means some files have time slices that are all missing?
- ForceZeroOffset from old History (aka don't timestep time-averaged files at midpoint of averaging period which is default). Related question, for time-averaged collections, metadata that says the variable is time averaged with the range for example?
- Do the answers to any of these questions above require a syntactical decision now before presenting outside of SI team?
chunking, another per-collection with override power in each variable?for that matter, deflate, bit shaving, global to collection, overridable per variable?Can we eliminate special monthly keyword?- make a decision where to put the start/stop collection time
- can each field override the time "mode"(instantaneous vs mean vs min vs max)? MAPL2 History let the user override time-averaging to min or max, if collection was instantaneous had no effect
- vertical grid and vertical regridding specification
- regex
- level selection, is that just a variant of "vertical" regridding? Does this belong elsewhere? Because you could do this on a dimension that has nothing to do with the special "vertical" ungridded dimension, it could be just subsetting ANY ungridded dimension, or some sort of generic "subsetting" syntax?
- Variable sets, do we even need this?
- do we expand output limitations mixed/center edge, 4-D variables i.e. vertical + ungridded dimension, not a syntax question per-se but something we should decide on for initial implementation
- tile regridding (assume ESMF will have this done so don't have to worry about it like we do now in History so nothing to actually do syntax wise?)
For reference, all keywords in old history collection can be found here.
version: 2
allow_overwrite: false.
experiment:
id: MAPL-v3
source: GEOSgcm-v10.22.0
description: >
long string across
many lines"
- Note the DAS needs the ability to turn off mid run, see end datetime in time handling section, but maybe it should go here, i.e. whether a collection is active has the time constrains here?
active_collections:
- geosgcm_prog
- geosgcm_surf
If the stop time or start time if you want it to turn on later in the run were embedded here, what would that even look like, like the values of the list could either be a scalar or a map?
active_collections:
# this one has no constraints, on all the time
- geosgcm_tend
# as an time interval give 2 iso times, but what if we want this to be open ended?
- geosgcm_prog: [2004-01-10T09:00:00, 2004-01-11T03:00:00]
# as separate start/end, if one or the other is not provided assume open
- geosgcm_surf: {end_time: 2004-01-10T09:00:00}
- geosgcm_turb: {start_time: 20004-01-01T9:00:00}
- What about things like selecting certain levels for output?
geoms:
geom_1: &geom_1
class: latlon
im: 360
jm: 180
pole: PE
dateline: DE
geom_2: &geom_2
class: swath
geom_3: &geom_3
class: trajectory
geom_4: &geom_4
class: station
geom_5: &geom_5
class: cubed-sphere
# This is just copying what is in the old collection...
vertical_grids:
pressure-levels: &pressure-levels
ref_var: DYN.PLE
function: log
levels: [1000, 975, 950, 925, 900, 875, 850, 825, 800, 775, 750, 725, 700, 650, 600, 550, 500, 450, 400, 350, 300, 250, 200, 150, 100, 70, 50, 40, 30, 20, 10, 7, 5, 4, 3, 2, 1, 0.7, 0.5, 0.4, 0.3, 0.2, 0.1, 0.07, 0.05, 0.04, 0.03, 0.02]
unit: hPA # What if this doesn't match variable, MAPL2 had a conversion factor you could add
synoptic_start: &synoptic_start 2000-04014T21:00:00 synoptic_end: &synoptic_end 2000-04-15T03:00:00
time_specs:
# all times in ISO times
# all frequencies in ISO durations
# instantaneous relative to reference time
six_hourly: %six_hourly
mode: instantaneous
frequency: P6H # ISO duration
ref_time: T21H # ISO time with no date
start_datetime: *synoptic_time # optional, default is start of calender
end_datetime: *synoptic_end # optional, default is end of universe
# instantaneous example on heartbeat
hearbeat: &heartbeat
# if frequency heartbeat, ref time is disallowed
mode: instantaneous # instantaneous (default), time-averaged, min, max
frequency: heartbeat # not default! dt of clock passed in...
# time averaged output every 6 hours
sixh_avg21: &sixh_avg21
mode: time-averaged
frequency: P6H # default is dt of clock passed in (heartbeat)
ref_time: 21H # if frequency not heartbeat, must specify reference time
# ref_time disallowed because frequency is a non-constant duration
# natural ref_time is clearly beginning of month
monthly: &monthly
mode: time-averaged
frequency: P1M
variable_sets:
dyn:
...
rad:
collections:
geosgcm_prog:
geom: *geom_1
vertical_grid: *pressure_levels
time_spec: *daily_avg21
template: %e.%c.%y4%m2%d2_%h2%n2z.nc4
# anything after this would have sensible defaults
archive: %c/Y%y4 # do we need this?
file_format: netcdf # default, will we even support another?
# the following can be overridden per-entry in the fields
compression_level: 1 # default 0
bit_shave: 12 # default no bit shaving, all kept
regrid_method: conservative # default bilinear
chunking: [180,90,1,1]
# The idea here is that the delimiter is how you separate the component/field
delimiter = '.'
var_list:
# basic single field output
PHIS: {expr: AGCM.PHIS, regrid_method: vote, vertical_method: ..., time_regrid: min/max/mean, units: 'ft', chunking: [90, 45, 1, 1] }
PHIS: {expr: AGCM.PHIS} # Gocart has . in name..., sigh
# debate if we should allow both or only 1 or the other if no alias desired
- AGCM.PHIS
- [AGCM.PHIS]
# two different ways to expression the item and alias
- [AGCM.PHIS, phis]
- {name: AGCM.PHIS, alias: phis}
# you many want to import a field into the component grid comp
# for use in expression later, but not actually write to file
- {name: DU.x, exclude: true, units: feet]
# vector, then vector with alias
- DYN.agrid_wind
- [DYN.agrid_wind, [u, v]]
- {name: DYN.agrid_wind, alias: [u, v]}
# expressions
- {expr: DU.x + SU.y, alias: weird}
# if items in expression are a vector, must specify which component
- {expr: sqrt(DYN.agrid_wind[1]**2+dyn.agrid_wind[2]**2), alias: wind_speed}
# or "dive" into vector like any other container?
- {expr: sqrt(DYN.agrid_wind.u**2+dyn.agrid_wind.v**2), alias: wind_speed}
# bundles, make the delimiter a general "diving"
# From PHYSICS component, get MTRI bundle, from MTRI bundle get NI::NO3an1M
- PHYSICS.MTRI.NI::NO3an1M
- [PHYSICS.MTRI.NI::NO3an1IM, NISV]
- {name: PHYSICS.NI::NO3an1IM, alias: NISV}
# example of override collection defaults for an entry
- {name: AGCM.PHIS, alias: phis, chunking: [90,90,1], compression_level: 2, bit_shave: 14, regrid_method: bilinear}
Old history has the "duration" keyword for collection but very problematic. New options explored below.
Note all options start with the same premise, the time to be written and the template determine which file you write when it is time to write, NOTHING ELSE, all the variations effect WHICH TIME INDEX you write to WITHIN A FILE.
Here are 4 variations
- Each time we History decides to write, it evaluates the template, this IS the file that will be written to, no more no less, the evaluated template is based on the "some time" (what time it is depends on things like is this time-averaged for example) provided by History
- It will write the next time index in the file, with a time value of this "some time" in point 1
- Check if the evaluated file from the template has not been written to this execution, if it has not we have the following options
- Check if the file exists, if not create it, by definition the "next" time index is 1, if time unlimited no need to determine anything else, store this so we know what time index for the next write
- Check if file exists, if it does exist (presumably from a previous segment, but then what if the file just happens to have the same name, would need good checking here that history really did write it etc...), determine how many times have been written to that file. Keep appending to it with the appropriate next time index
- If the file has already been written to, well then you know the next time index so write to that time index
- Each time we History decides to write, it evaluates the template, this IS the file that will be written to, no more no less, the evaluated template is based on the "some time" (what time it is depends on things like is this time-averaged for example) provided by History.
- It will write the next time index in the file, with a time value of this "some time" in point 1
- If the file has not been written to this execution, check if the file we want to write to already exists, if so die. If not determine how many time slices are left go into this file until we hit a new file by using the frequency and ref_time, create said file, start time index at 1 for bookkeeping purposes. Write time index 1
- If the file has been written to already, write to "next" time index
- Each time we History decides to write, it evaluates the template, this IS the file that will be written to, no more no less, the evaluated template is based on the "some time" (what time it is depends on things like is this time-averaged for example) provided by History.
- If the file has not been written to this execution, check if the file we want to write to already exists, if so die. If not determine the TOTAL NUMBER OF TIME INDICES IN THE FILE that go into this file based on the frequency, ref time, and template. Create said file.
- Write to the appropriate index based on the time given the frequency and reference time. Note this may mean some time slices will never be written to so some variables for a given time range will undefined. Better compress these...
Like the above, but this time rather than die if the file exists, just use it (rather than creating a file) presuming a previous execution created it "correctly", just write to the appropriate index based on time, if the previous execution did it's job right, that will just work.
Still could end up with files at beginning or end of a long run with missing data.