s3 object store #600
Replies: 6 comments
-
Just as a comment: we need the iris library to read the data. Using a https request to the s3 bucket that should be possible right away (since the netcdf library should have that part compiled in), but iris does not talk s3 natively. |
Beta Was this translation helpful? Give feedback.
-
Reading briefly through the above mentioned documentation at https://www.unidata.ucar.edu/blogs/developer/entry/overview-of-zarr-support-in, I see that the netcdf library actually has some S3 support built in now. But this cloud support lacks support for an unlimited dimension (which 98% of our data files have) and netcdf string support (which around 5% of our data has). Not sure if it is the right time look further into it |
Beta Was this translation helpful? Give feedback.
-
The zarr/S3 support of |
Beta Was this translation helpful? Give feedback.
-
Is it worth to look at this further at this time then? Or do we just follow the netCDF library's development until we think it might work? |
Beta Was this translation helpful? Give feedback.
-
The development in |
Beta Was this translation helpful? Give feedback.
-
Iris is more than a netcdf reader for pyaerocom. An example is concatenation (including unifying the attributes), unit handling and probably more I don't know about yet. So these things would need replacement as well if we are to replace the reader. |
Beta Was this translation helpful? Give feedback.
-
capability to use s3-compatible object stores for data needed to run the tests or required to be available outside of the internal network
we have currently set up 2 test buckets for pyaerocom to use, one on a min.io server set up through kubernetes:
https://minio.test-charlien.k8s.met.no/minio/pya-test-bucket/
and one through the recently created Met object storage solution (https://gitlab.met.no/it/infra/ostack-doc/-/blob/master/External_objectstore_s3.md)
https://rgw.met.no/6f4fcddae54549dbbe12044fa1dfda7c:pya-test-bucket
interaction with these buckets can be done with a number of tools, like
rclone
,s3cmd
, or python libraries such ass3fs
,boto
,boto3
(they require either a configuration file with credentials such asaws_access_key_id
andaws_secret_access_key
or for these or similar credentials to be set up when creating a connection to the bucket)am unsure how to best share credentials in "production" but at this testing stage we can just make them public I guess?
in principle netCDF tools can also interact directly with s3-compatible object stores (see https://www.unidata.ucar.edu/blogs/developer/entry/overview-of-zarr-support-in) but at the moment NCZarr support seems to be still buggy (Magnus has opened an issue with them Unidata/netcdf-c#2151)
Beta Was this translation helpful? Give feedback.
All reactions