-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authentication timeout when requesting cmip6 data from "cil-gdpcir-cc0" #388
Comments
Out of curiosity, when that last step is running, what network download speed do you see? Eg if you open task manager and go to Performance then Ethernet. |
@777arc, during the execution of cell 11 (using the original ensemble.ipynb) I saw network speeds between 4,000-17,000 KB/sec for the associated Python process, which took 260 seconds to complete. After it completed, I looked at Also, I noticed that in the cell outputs saved to the version of the notebook online, cell 11 only took 9 seconds to complete, which is significantly faster than what I am experiencing. |
This one took about 80s for me, but yeah let me try to figure out why it's downloading GBs of data to only produce a few MB of an xarray
|
Turns out there are some hints as to why this is the case in the output of the 2nd cell- "The data is chunked at each interval of 365 days and 90 degree interval of latitude and longitude. Therefore, each chunk is |
Thanks @777arc, that provides some clarity on the download size discrepancy. I understand that the data fetching cannot be sped up, but is there a way to prevent the SAS token from expiring during long downloads? My original issue is that my data fetches were resulting in Authentication errors due to downloads takings longer that the token lifetime. |
Ah ok, yeah I think it's set to 45m expiration right now. We can't increase that, so what I would do is break up the time range so that you grab a new token between each time step, or every N time steps, since you're interested in such a huge span of time. |
Okay that makes sense. Do you have a code snippet for generating and using new SAS tokens on the fly that you could share? |
And to confirm, there is currently no option to use a permanent API key with planetary computer, correct? |
There is no permanent API key because there is no API key needed for the planetary computer, it's totally public. It's the blob storage SAS key that only lasts 45m, it gets generated when you do a search for items to go along with each asset so that you can download it freely from our blob storage (for 45m). So what you can do is do your initial search to get the list of item IDs, search = catalog.search(
collections=["cil-gdpcir-cc0", "cil-gdpcir-cc-by"],
query={"cmip6:experiment_id": {"eq": "ssp370"}},
)
items = search.item_collection()
ids = [item.id for item in items]
print(ids) then loop through each item and do a search on it to make sure the token is fresh, then do whatever processing you want from urllib.parse import unquote
first_id = ids[0] # you'll loop through instead, im just showing one as an example
search = catalog.search(
collections=["cil-gdpcir-cc0", "cil-gdpcir-cc-by"],
ids=[first_id],
)
item0 = search.item_collection()[0]
cred = item0.assets["pr"].to_dict()["xarray:open_kwargs"]["storage_options"]["credential"] # this is where the SAS token is stored
cred = unquote(cred)
print(cred.split('&')[1]) # this is the expiration date portion of the token It's possible there's some sort of permanent key we can arrange though, email [email protected] about the details of what dataset you're using. Also the last section here might help https://planetarycomputer.microsoft.com/docs/concepts/sas/ |
If I am following correctly, the expiration date should update each time you run
produces
|
The Alternatively if you can run your code in the Azure West Europe region, we will issue longer-lived tokens and by collocating your compute with the data region you will get faster data reads. |
Hello,
When I attempt to fetch cmip6 data from the "cil-gdpcir-cc0" collection, the download is abnormally slow and often ends up in an authentication timeout. For example, fetching "tasmin" for a single point for a single model/ssp over the entire time period (~85 years of daily time points) takes over 20 minutes to fetch and ultimately gives the following error:
To confirm this was not an issue with my code, I attempted to run the ensemble.ipynb from the planetary computer examples. I edited cell 10 to:
which extends the time period to 10 years and I obtained a similar error to what I saw in my own code.
In both examples, the data I am fetching is pretty small so it surprises me that they are extending past the authentication time period.
I saw that #109 brought up some similar concerns and the suggestion was to use a subscription key or work on Planetary Computer Hub. I have not been able to find a way to obtain a subscription key and the Hub has been discontinued, so I am looking for other routes to address this timeout issue.
The text was updated successfully, but these errors were encountered: