Drastic speed difference between GCS and S3 #395
Replies: 4 comments 15 replies
-
cfgrib codec versions: s3: gcs: |
Beta Was this translation helpful? Give feedback.
-
Where do I get the "gribberish" codec? |
Beta Was this translation helpful? Give feedback.
-
I got for GCS:
(where initial connection contains extra latency for auth and ssl negotiation) for S3:
.. so, the same? |
Beta Was this translation helpful? Give feedback.
-
I was able to run the notebook from GCS (us-central1) |
Beta Was this translation helpful? Give feedback.
-
I am not sure if this is an fsspec issue or a kerchunk one (probably the former) but I am posting here because it is affecting the kerchunking workflow.
I have the same kerchunked dataset created pointing at s3 and gcs:
s3:
s3://nextgen-dmac/kerchunk/hrrr_subhourly.json
gcs:
gs://squall-hrrr/hrrr_subhourly.json
The s3 version has references to chunks in s3, the gcs version has references to chunks in gcs.
The notebook with the comparison is available here: https://github.com/mpiannucci/ocean-notebooks/blob/main/hrrr_timeseries.ipynb
The difference in proccessing time is staggering. There are 72 references per variable, so we are loading time, u, and v, then doing compute. That is the time to download and transform 216 references.
Load Dataset
Extract Wind Speed and Direction Timeseries
However, when I cat the chunks and json there is no difference in speed.
Note I am using my custom grib parser, but tested the same datasets with the cfgrib codec and had the same results.
Thanks for any help!
Beta Was this translation helpful? Give feedback.
All reactions