Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where to interact with polar.pangeo.io? #481

Closed
rabernat opened this issue Nov 15, 2018 · 13 comments
Closed

where to interact with polar.pangeo.io? #481

rabernat opened this issue Nov 15, 2018 · 13 comments
Labels

Comments

@rabernat
Copy link
Member

@NicWayand published a great blog post today about polar.pangeo.io!
https://medium.com/pangeo/polar-deployment-of-pangeo-96865774287c

He concludes by asking other people to get involved by using the cluster and / or adding datasets:

Request or add new datasets by submitting an Issue or emailing me at [email protected]

The repo referenced here is the general pangeo one. But I think it would be best to have a dedicated forum where current and potential polar.pangeo.io users can interact. Currently the cluster is deployed from https://github.com/NicWayand/polar.pangeo.io-deploy, which doesn't have an issue tracker because it's a fork.

Would it make sense to move that repo here with the other pangeo deploy repos, and to un-fork it so it is a standalone, full-fledged repo? More generally, what sort of interaction between users and cluster admins do we want to encourage?

Somewhat related to #476.

@rabernat
Copy link
Member Author

I will use this thread to discuss another issue I have encountered with the polar datasets.

I am loading a dataset like this

import intake
catalog_url = 'https://raw.githubusercontent.com/NicWayand/polar.pangeo.io-deploy/staging/deployments/polar.pangeo.io/image/catalog.yaml'
cat = intake.Catalog(catalog_url)
ds_nsidc = cat.NSIDC_0081.to_dask()
ds_nsidc

The dataset looks like this:

<xarray.Dataset>
Dimensions:    (time: 1384, x: 304, y: 448)
Coordinates:
    hole_mask  (y, x) int8 dask.array<shape=(448, 304), chunksize=(448, 304)>
    lat        (x, y) float64 dask.array<shape=(304, 448), chunksize=(304, 448)>
    lon        (x, y) float64 dask.array<shape=(304, 448), chunksize=(304, 448)>
  * time       (time) datetime64[ns] 2015-01-01 2015-01-02 2015-01-03 ...
  * x          (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
    xm         (x) int64 dask.array<shape=(304,), chunksize=(304,)>
  * y          (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
    ym         (y) int64 dask.array<shape=(448,), chunksize=(448,)>
Data variables:
    area       (time) float64 dask.array<shape=(1384,), chunksize=(1,)>
    extent     (time) float64 dask.array<shape=(1384,), chunksize=(1,)>
    sic        (time, y, x) float64 dask.array<shape=(1384, 448, 304), chunksize=(1, 448, 304)>

It is notable that there is zero metadata anywhere in this dataset (xarray dataset and variable attributes are all empty). Consequently it is very hard for the user to know what they are looking at. NetCDF files distributed by official data providers strive hard to be CF Compliant. When we put data in the cloud, we should strive to preserve the metadata as much as possible.

I'm curious how these datasets were produced and how we might go about recovering the metadata. Particularly important is information about the map projection.

For an example of CF-complaint dataset, you can look at this example from the Altimetry Analysis Use Case. In addition to the dataset-level metadata, there is also variable-specific metadata for each variable.

``` Dimensions: (latitude: 720, longitude: 1440, nv: 2, time: 8901) Coordinates: crs int32 ... lat_bnds (time, latitude, nv) float32 dask.array * latitude (latitude) float32 -89.875 -89.625 -89.375 -89.125 -88.875 ... lon_bnds (longitude, nv) float32 dask.array * longitude (longitude) float32 0.125 0.375 0.625 0.875 1.125 1.375 1.625 ... * nv (nv) int32 0 1 * time (time) datetime64[ns] 1993-01-01 1993-01-02 1993-01-03 ... Data variables: adt (time, latitude, longitude) float64 dask.array err (time, latitude, longitude) float64 dask.array sla (time, latitude, longitude) float64 dask.array ugos (time, latitude, longitude) float64 dask.array ugosa (time, latitude, longitude) float64 dask.array vgos (time, latitude, longitude) float64 dask.array vgosa (time, latitude, longitude) float64 dask.array Attributes: Conventions: CF-1.6 Metadata_Conventions: Unidata Dataset Discovery v1.0 cdm_data_type: Grid comment: Sea Surface Height measured by Altimetry... contact: [email protected] creator_email: [email protected] creator_name: CMEMS - Sea Level Thematic Assembly Center creator_url: http://marine.copernicus.eu date_created: 2014-02-26T16:09:13Z date_issued: 2014-01-06T00:00:00Z date_modified: 2015-11-10T19:42:51Z geospatial_lat_max: 89.875 geospatial_lat_min: -89.875 geospatial_lat_resolution: 0.25 geospatial_lat_units: degrees_north geospatial_lon_max: 359.875 geospatial_lon_min: 0.125 geospatial_lon_resolution: 0.25 geospatial_lon_units: degrees_east geospatial_vertical_max: 0.0 geospatial_vertical_min: 0.0 geospatial_vertical_positive: down geospatial_vertical_resolution: point geospatial_vertical_units: m history: 2014-02-26T16:09:13Z: created by DUACS D... institution: CLS, CNES keywords: Oceans > Ocean Topography > Sea Surface ... keywords_vocabulary: NetCDF COARDS Climate and Forecast Stand... license: http://marine.copernicus.eu/web/27-servi... platform: ERS-1, Topex/Poseidon processing_level: L4 product_version: 5.0 project: COPERNICUS MARINE ENVIRONMENT MONITORING... references: http://marine.copernicus.eu source: Altimetry measurements ssalto_duacs_comment: The reference mission used for the altim... standard_name_vocabulary: NetCDF Climate and Forecast (CF) Metadat... summary: SSALTO/DUACS Delayed-Time Level-4 sea su... time_coverage_duration: P1D time_coverage_end: 1993-01-01T12:00:00Z time_coverage_resolution: P1D time_coverage_start: 1992-12-31T12:00:00Z title: DT merged all satellites Global Ocean Gr... ```

@martindurant
Copy link
Contributor

If you would open the same data with a gcsfs mapper directly, would you see the metadata attributes?

@NicWayand
Copy link
Member

Thanks @rabernat for bringing this up. The metadata is missing and I will raise an issue to add it back in.

@NicWayand
Copy link
Member

On a similar thread... Does Anyone have suggestions for how to get a DOI for a Zarr dataset that is updated daily in a google cloud bucket? Ideally the DOI would point to the most recent version, but I am fine freezing it for the DOI. I have used sites like Zenodo before, but would have to Tar the Zarr files first before uploading, which seems inefficient.

@NicWayand
Copy link
Member

Ok @rabernat metadata added. Hope it is useful for your class now. If you (or anyone else) finds any issues with the metadata, please let me know.

<xarray.Dataset>
Dimensions:      (fore_time: 52, init_end: 48, model: 20, x: 304, y: 448)
Coordinates:
    crs          object ...
  * fore_time    (fore_time) timedelta64[ns] 0 days 7 days 14 days 21 days ...
  * init_end     (init_end) datetime64[ns] 2018-01-07 2018-01-14 2018-01-21 ...
    init_start   (init_end) datetime64[ns] dask.array<shape=(48,), chunksize=(48,)>
    lat          (x, y) float64 dask.array<shape=(304, 448), chunksize=(152, 224)>
    lon          (x, y) float64 dask.array<shape=(304, 448), chunksize=(152, 224)>
  * model        (model) object 'Observed' 'awispin' 'climatology' ...
    valid_end    (init_end, fore_time) datetime64[ns] dask.array<shape=(48, 52), chunksize=(48, 52)>
    valid_start  (init_end, fore_time) datetime64[ns] dask.array<shape=(48, 52), chunksize=(48, 52)>
  * x            (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
  * y            (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
Data variables:
    SIP          (init_end, model, fore_time, y, x) float64 dask.array<shape=(48, 20, 52, 448, 304), chunksize=(1, 1, 1, 448, 304)>
    anomaly      (init_end, model, fore_time, y, x) float64 dask.array<shape=(48, 20, 52, 448, 304), chunksize=(1, 1, 1, 448, 304)>
    mean         (init_end, model, fore_time, y, x) float64 dask.array<shape=(48, 20, 52, 448, 304), chunksize=(1, 1, 1, 448, 304)>
Attributes:
    comment:                    Weekly mean sea ice concentration forecasted ...
    contact:                    [email protected]
    creator_email:              [email protected]
    creator_name:               Nicholas Wayand, University of Washington
    creator_url:                https://atmos.uw.edu/sipn/
    date_created:               2018-12-03T00:00:00
    date_modified:              2018-12-04T16:02:19
    geospatial_lat_max:         89.83682
    geospatial_lat_min:         31.102670000000003
    geospatial_lat_resolution:  ~25km
    geospatial_lat_units:       degrees_north
    geospatial_lon_max:         179.81398000000002
    geospatial_lon_min:         -180.00000000000003
    geospatial_lon_resolution:  ~25km
    geospatial_lon_units:       degrees_east
    history:                    2018-12-04T16:02:19: updated by Nicholas Wayand
    institution:                UW, SIPN, ARCUS
    keywords:                   Arctic > Sea ice concentration > Prediction
    product_version:            1.0
    project:                    Sea Ice Prediction Network Phase II
    references:                 Wayand, N.E., Bitz, C.M., and E. Blanchard-Wr...
    source:                     Numerical model predictions and Passive micro...
    summary:                    Dataset is updated daily with weekly sea ice ...
    time_coverage_end:          2019-11-24T00:00:00
    time_coverage_start:        2018-01-01T00:00:00
    title:                      SIPN2 Sea ice Concentration Forecasts and Obs...

@stale
Copy link

stale bot commented Feb 4, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 4, 2019
@stale
Copy link

stale bot commented Feb 11, 2019

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

@stale stale bot closed this as completed Feb 11, 2019
@rabernat
Copy link
Member Author

Is polar.pangeo.io being used?

This cluster has had 3 compute instances running constantly since November, at a cost of about $300 per month. Have not heard much from @NicWayand since the initial setup. If the cluster is being used, great, carry on! If not, let's assess whether it makes sense to keep paying for this.

@rabernat rabernat reopened this Mar 24, 2019
@stale stale bot removed the stale label Mar 24, 2019
@jhamman
Copy link
Member

jhamman commented Mar 25, 2019

FWIW, I spent a few minutes cleaning up the polar cluster today. It now will idle at its intended 2 compute instances.

@rabernat
Copy link
Member Author

rabernat commented Apr 5, 2019

Any update from @NicWayand? Should we shut this cluster down? Consolidate with others?

xref pangeo-data/pangeo-cloud-federation#215

@stale
Copy link

stale bot commented Jun 4, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 4, 2019
@stale
Copy link

stale bot commented Jun 11, 2019

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

@stale stale bot closed this as completed Jun 11, 2019
@DaniJonesOcean
Copy link

It seems that polar.pangeo.io has been shut down. Is that correct? Has it been merged after all?

(Apologies if I have missed something.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants