Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monthly NRT #168

Merged
merged 8 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions doc/operation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Sea Ice E-CDR Operations

This document outlines how the `seaice_ecdr` code is leveraged in operations.


## Command-line Interface (CLI)

The `./scripts/cli.sh` script is the primary entrypoint for interacting with the
CLI. Not all CLI subcommands are used in operations (e.g., they are used
exclusively in dev). The CLI subcommands outlined below should be used in
production.

**NOTE**: on NSIDC production VMs, the CLI is setup to be available on the
system PATH as `ecdr`. E.g.,:

```
$ ecdr --help
```

## G10016 NRT Processing

NRT data will be written to
`/share/apps/G10016_V3/v03r00/production/complete/`. The contents of this
directory should be rsync-ed to `/disks/sidads_ftp/pub/DATASETS/NOAA/G10016_V3`
after successful completion of each G10016 procesing job.

### Daily processing

Daily NRT processing should occur by running this command:

```
daily-nrt --last-n-days 5 --hemisphere both
```

Note that the `--overwrite` flag can be used to re-create NRT data if e.g., a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to explicitly give the command to use the overwrite flag?

eg

daily-nrt --last-n-days 5 --hemisphere both --overwrite

...in this README.md ?

data gap is filled a few days late.

### Monthly processing

**TODO**: the code does not yet support this.


## G02202 "final" Processing

Final data will be written to
`/share/apps/G02202_V5/v05r00/production/complete/`. The contents of this
directory should be rsync-ed to `/disks/sidads_ftp/pub/DATASETS/NOAA/G02202_V5`
after successful completion of each G02202 procesing job.

Typically, "final" procesing occurs all at once, as data becomes
finalized/available for NSIDC-0001. In other words, the following do not need to
be run on a daily/monthly basis, but instead can be bundled into one job. See
[the ops job for
v4](https://ci.jenkins-ops-2022.apps.int.nsidc.org/job/G02202_Generate_Dataset_Production)
as an example.

### Daily processing

To create daily data:

```
daily --start-date YYYY-MM-DD --end-date YYYY-MM-DD --hemisphere {north|south}
```

Once daily data for a year is available, this data should be aggregated with the
`daily-aggregate` command:

```
daily-aggregate --year YYYY --hemisphere {north|south}
```

There will be one daily aggregate file per year per hemisphere.

### Monthly processing

When a month's worth of daily data is available, monthly data files can be produced:

```
monthly --year YYYY --month mm --hemisphere {north|south}
```

A range of years/months can also be specified:


```
monthly --year YYYY --month mm --end-year YYYY --end-month MM --hemisphere {north|south}
```

Each time a new monthly file is produced, the monthly aggregate file should be
updated. There will always only be one monthly aggregate file per hemisphere:

```
monthly-aggregate --hemisphere {north | south}
```

### Validation

Each time finalized data is produced, the validation CLI should be run:


```
validate-outputs --hemisphere {north|south} --start-date YYYY-MM-DD --end-date YYYY-MM-DD
```

This produces log files in
`/share/apps/G02202_V5/v05r00_outputs/production/validation/` that should be
published to the production location. TODO: confirm this is accurate. Does not
look like v4 does this.
4 changes: 3 additions & 1 deletion seaice_ecdr/cli/entrypoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from seaice_ecdr.cli.daily import cli as daily_cli
from seaice_ecdr.cli.monthly import cli as monthly_cli
from seaice_ecdr.cli.monthly_nrt import cli as monthly_nrt_cli
from seaice_ecdr.cli.nrt import cli as nrt_cli
from seaice_ecdr.daily_aggregate import cli as daily_aggregate_cli
from seaice_ecdr.initial_daily_ecdr import cli as ecdr_cli
Expand Down Expand Up @@ -43,9 +44,10 @@ def cli():
cli.add_command(monthly_cli)
# Generate monthly aggregate file (one per hemisphere)
cli.add_command(monthly_aggregate_cli)
# Wraps the `nrt_ecdr_for_dates` CLI with the correct platform start date
# Wraps the NRT CLIs with the correct platform start date
# configuration chosen.
cli.add_command(nrt_cli)
cli.add_command(monthly_nrt_cli)

if __name__ == "__main__":
cli()
1 change: 1 addition & 0 deletions seaice_ecdr/cli/monthly.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ def make_monthly_25km_ecdr(
base_output_dir=base_output_dir,
hemisphere=hemisphere,
resolution=RESOLUTION,
is_nrt=False,
)


Expand Down
135 changes: 135 additions & 0 deletions seaice_ecdr/cli/monthly_nrt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
from pathlib import Path
from typing import Final, get_args

import click
import pandas as pd
from pm_tb_data._types import Hemisphere

from seaice_ecdr.cli.util import CLI_EXE_PATH, run_cmd
from seaice_ecdr.constants import DEFAULT_BASE_NRT_OUTPUT_DIR
from seaice_ecdr.platforms.config import (
NRT_PLATFORM_START_DATES_CONFIG_FILEPATH,
)
from seaice_ecdr.publish_monthly import prepare_monthly_nc_for_publication


def make_monthly_25km_ecdr(
year: int,
month: int,
end_year: int | None,
end_month: int | None,
hemisphere: Hemisphere,
base_output_dir: Path,
):
if end_year is None:
end_year = year
if end_month is None:
end_month = month

# TODO: consider extracting these to CLI options that default to these values.
RESOLUTION: Final = "25"
ANCILLARY_SOURCE: Final = "CDRv5"
# TODO: the amsr2 start date should ideally be read from the platform start
# date config.
# Use the default platform dates, which excldues AMSR2
run_cmd(
f"export PLATFORM_START_DATES_CONFIG_FILEPATH={NRT_PLATFORM_START_DATES_CONFIG_FILEPATH} &&"
f" {CLI_EXE_PATH} intermediate-monthly"
f" --year {year} --month {month}"
f" --end-year {end_year} --end-month {end_month}"
f" --hemisphere {hemisphere}"
f" --base-output-dir {base_output_dir}"
f" --resolution {RESOLUTION}"
f" --ancillary-source {ANCILLARY_SOURCE}"
" --is-nrt"
)

# Prepare the monthly data for publication
for period in pd.period_range(
start=pd.Period(year=year, month=month, freq="M"),
end=pd.Period(year=end_year, month=end_month, freq="M"),
freq="M",
):
prepare_monthly_nc_for_publication(
year=period.year,
month=period.month,
base_output_dir=base_output_dir,
hemisphere=hemisphere,
resolution=RESOLUTION,
is_nrt=True,
)


@click.command(name="monthly-nrt")
@click.option(
"--year",
required=True,
type=int,
help="Year for which to create the monthly file.",
)
@click.option(
"--month",
required=True,
type=int,
help="Month for which to create the monthly file.",
)
@click.option(
"--end-year",
required=False,
default=None,
type=int,
help="If given, the end year for which to create monthly files.",
)
@click.option(
"--end-month",
required=False,
default=None,
type=int,
help="If given, the end year for which to create monthly files.",
)
@click.option(
"-h",
"--hemisphere",
required=True,
type=click.Choice(get_args(Hemisphere)),
)
@click.option(
"--base-output-dir",
required=True,
type=click.Path(
exists=True,
file_okay=False,
dir_okay=True,
writable=True,
resolve_path=True,
path_type=Path,
),
default=DEFAULT_BASE_NRT_OUTPUT_DIR,
help=(
"Base output directory for NRT ECDR outputs."
" Subdirectories are created for outputs of"
" different stages of processing."
),
show_default=True,
)
def cli(
*,
year: int,
month: int,
end_year: int | None,
end_month: int | None,
hemisphere: Hemisphere,
base_output_dir: Path,
) -> None:
make_monthly_25km_ecdr(
year=year,
month=month,
end_year=end_year,
end_month=end_month,
hemisphere=hemisphere,
base_output_dir=base_output_dir,
)


if __name__ == "__main__":
cli()
15 changes: 14 additions & 1 deletion seaice_ecdr/intermediate_monthly.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ def _get_daily_complete_filepaths_for_month(
intermediate_output_dir: Path,
hemisphere: Hemisphere,
resolution: ECDR_SUPPORTED_RESOLUTIONS,
is_nrt: bool,
) -> list[Path]:
"""Return a list of paths to ECDR daily complete filepaths for the given year and month."""
data_list = []
Expand All @@ -93,7 +94,7 @@ def _get_daily_complete_filepaths_for_month(
resolution=resolution,
intermediate_output_dir=intermediate_output_dir,
platform_id=platform.id,
is_nrt=False,
is_nrt=is_nrt,
)
if expected_fp.is_file():
data_list.append(expected_fp)
Expand Down Expand Up @@ -135,6 +136,7 @@ def get_daily_ds_for_month(
intermediate_output_dir: Path,
hemisphere: Hemisphere,
resolution: ECDR_SUPPORTED_RESOLUTIONS,
is_nrt: bool,
) -> xr.Dataset:
"""Create an xr.Dataset wtih ECDR complete daily data for a given year and month.

Expand All @@ -148,6 +150,7 @@ def get_daily_ds_for_month(
intermediate_output_dir=intermediate_output_dir,
hemisphere=hemisphere,
resolution=resolution,
is_nrt=is_nrt,
)
# Read all of the complete daily data for the given year and month.
ds = xr.open_mfdataset(data_list)
Expand Down Expand Up @@ -636,13 +639,15 @@ def make_intermediate_monthly_nc(
intermediate_output_dir: Path,
resolution: ECDR_SUPPORTED_RESOLUTIONS,
ancillary_source: ANCILLARY_SOURCES,
is_nrt: bool,
) -> Path:
daily_ds_for_month = get_daily_ds_for_month(
year=year,
month=month,
intermediate_output_dir=intermediate_output_dir,
hemisphere=hemisphere,
resolution=resolution,
is_nrt=is_nrt,
)

platform_id = daily_ds_for_month.platform_id
Expand Down Expand Up @@ -745,6 +750,12 @@ def make_intermediate_monthly_nc(
required=True,
type=click.Choice(get_args(ANCILLARY_SOURCES)),
)
@click.option(
"--is-nrt",
required=False,
is_flag=True,
help=("Create intermediate monthly file in NRT mode (uses NRT-stype filename)."),
)
def cli(
*,
year: int,
Expand All @@ -755,6 +766,7 @@ def cli(
base_output_dir: Path,
resolution: ECDR_SUPPORTED_RESOLUTIONS,
ancillary_source: ANCILLARY_SOURCES,
is_nrt: bool,
):
if end_year is None:
end_year = year
Expand All @@ -779,6 +791,7 @@ def cli(
hemisphere=hemisphere,
resolution=resolution,
ancillary_source=ancillary_source,
is_nrt=is_nrt,
)
except Exception:
error_periods.append(period)
Expand Down
16 changes: 16 additions & 0 deletions seaice_ecdr/nrt.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
)
from seaice_ecdr.intermediate_daily import (
complete_daily_ecdr_ds,
get_ecdr_filepath,
)
from seaice_ecdr.publish_daily import (
get_complete_daily_filepath,
Expand Down Expand Up @@ -291,6 +292,21 @@ def nrt_ecdr_for_day(
is_nrt=True,
ancillary_source=ancillary_source,
)
# Write the daily intermediate file. This is used by the monthly NRT
# processing to produce the monthly fields.
cde_ds_filepath = get_ecdr_filepath(
date=date,
hemisphere=hemisphere,
resolution=NRT_RESOLUTION,
intermediate_output_dir=intermediate_output_dir,
platform_id=NRT_PLATFORM_ID,
is_nrt=True,
)
cde_ds.to_netcdf(
cde_ds_filepath,
)

# Prepare the ds for publication
daily_ds = make_publication_ready_ds(
intermediate_daily_ds=cde_ds,
hemisphere=hemisphere,
Expand Down
Loading
Loading