s3 object store #600

charlienegri · 2021-11-29T12:15:41Z

charlienegri
Nov 29, 2021
Maintainer

capability to use s3-compatible object stores for data needed to run the tests or required to be available outside of the internal network

we have currently set up 2 test buckets for pyaerocom to use, one on a min.io server set up through kubernetes:
https://minio.test-charlien.k8s.met.no/minio/pya-test-bucket/

and one through the recently created Met object storage solution (https://gitlab.met.no/it/infra/ostack-doc/-/blob/master/External_objectstore_s3.md)
https://rgw.met.no/6f4fcddae54549dbbe12044fa1dfda7c:pya-test-bucket

interaction with these buckets can be done with a number of tools, like rclone, s3cmd, or python libraries such as s3fs, boto, boto3 (they require either a configuration file with credentials such as aws_access_key_id and aws_secret_access_key or for these or similar credentials to be set up when creating a connection to the bucket)
am unsure how to best share credentials in "production" but at this testing stage we can just make them public I guess?

in principle netCDF tools can also interact directly with s3-compatible object stores (see https://www.unidata.ucar.edu/blogs/developer/entry/overview-of-zarr-support-in) but at the moment NCZarr support seems to be still buggy (Magnus has opened an issue with them Unidata/netcdf-c#2151)

jgriesfeller · 2021-11-29T12:35:05Z

jgriesfeller
Nov 29, 2021
Maintainer

Just as a comment: we need the iris library to read the data. Using a https request to the s3 bucket that should be possible right away (since the netcdf library should have that part compiled in), but iris does not talk s3 natively.
This question from stack overflow suggests downloading the data first.

0 replies

jgriesfeller · 2021-11-29T12:42:26Z

jgriesfeller
Nov 29, 2021
Maintainer

Reading briefly through the above mentioned documentation at https://www.unidata.ucar.edu/blogs/developer/entry/overview-of-zarr-support-in, I see that the netcdf library actually has some S3 support built in now. But this cloud support lacks support for an unlimited dimension (which 98% of our data files have) and netcdf string support (which around 5% of our data has). Not sure if it is the right time look further into it

0 replies

magnusuMET · 2021-12-03T14:56:17Z

magnusuMET
Dec 3, 2021
Collaborator

The zarr/S3 support of netCDF is still lacking some functionality. It is possible to write data, but .nczarr files are not created which means only single variables can be read back, instead of the full dataset.

0 replies

jgriesfeller · 2021-12-03T15:00:28Z

jgriesfeller
Dec 3, 2021
Maintainer

Is it worth to look at this further at this time then? Or do we just follow the netCDF library's development until we think it might work?

0 replies

magnusuMET · 2021-12-03T19:19:40Z

magnusuMET
Dec 3, 2021
Collaborator

The development in netCDF will take some time to get to a working condition. If we want this in pyaerocom we should look at readers supporting s3fs of fsspec and use zarr.

0 replies

jgriesfeller · 2021-12-06T09:21:56Z

jgriesfeller
Dec 6, 2021
Maintainer

Iris is more than a netcdf reader for pyaerocom. An example is concatenation (including unifying the attributes), unit handling and probably more I don't know about yet. So these things would need replacement as well if we are to replace the reader.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s3 object store #600

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

s3 object store #600

charlienegri Nov 29, 2021 Maintainer

Replies: 6 comments

jgriesfeller Nov 29, 2021 Maintainer

jgriesfeller Nov 29, 2021 Maintainer

magnusuMET Dec 3, 2021 Collaborator

jgriesfeller Dec 3, 2021 Maintainer

magnusuMET Dec 3, 2021 Collaborator

jgriesfeller Dec 6, 2021 Maintainer

charlienegri
Nov 29, 2021
Maintainer

jgriesfeller
Nov 29, 2021
Maintainer

jgriesfeller
Nov 29, 2021
Maintainer

magnusuMET
Dec 3, 2021
Collaborator

jgriesfeller
Dec 3, 2021
Maintainer

magnusuMET
Dec 3, 2021
Collaborator

jgriesfeller
Dec 6, 2021
Maintainer