Xpublish hosting and performance metrics #227

jonmjoyce · 2023-08-28T17:46:27Z

jonmjoyce
Aug 28, 2023

I'd like to start a thread to discuss deployment architectures and gather the typical costs that existing users have experienced.

I'm aware that there isn't a standard architecture for hosting xpublish right now, but several projects are doing it such as xpublish-host and our RPS server xreds that we're currently running on Kubernetes.

Right now I believe the limiting factors for xpublish are memory and network bandwidth. Memory to load dataset chunks that users are requesting and egress to deliver those chunks. The important metric here being that it's not necessarily all available datasets and variables, just a handful that are actually used which typically is a fraction of the total data volume. We may run into some CPU constraints when reprojecting some more complex models but I'm unaware of what that looks like right now as we haven't started that work yet. I know that xpublish-host is running local dask workers but that is limited to scaling on the hosting instance.

For one sample dataset, the memory for xpublish on xreds with just 2 datasets climbed to 2GB as the cache grew over a week. We use 2 CPUs and haven't performed any load testing. Does anyone else have metrics they can share? @abkfenris @kthyng

abkfenris · 2023-08-28T18:21:58Z

abkfenris
Aug 28, 2023
Maintainer

We don't have much hitting it yet, but we're deploying Xpublish on Kubernetes. Each pod is running a single bare uvicorn worker (rather than running gunicorn and a collection of workers, as last I read the FastAPI docs, that was the suggestion for a Docker based deployment. Instead we have the Kubernetes deployment run 4 pods, and let it deal with balancing between them.

Right now it looks like each of those pods is hanging out near 200 MB after running for over a week, but we aren't letting the cache have too much to play with right now.

One of the interesting things about the current cachey.Cache implementation is that it accepts a object that supports the MutableMapping protocol.

One of the things I've been pondering is a shared cache, possibly using Redis or similar. It turns out that Zarr stores need to be a subclass of MutableMapping, so any of the existing stores could be reused as a cache backend.

0 replies

kthyng · 2023-09-13T13:57:16Z

kthyng
Sep 13, 2023

We're using xpublish-host for xpublish but not on the cloud, just on our servers. I got to the point in my experimenting that I got it running (pointing xpublish to a kerchunk file for a large model dataset) but then nothing really worked due to memory issues. I wasn't able to take the time to troubleshoot unfortunately.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xpublish hosting and performance metrics #227

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Xpublish hosting and performance metrics #227

jonmjoyce Aug 28, 2023

Replies: 2 comments

abkfenris Aug 28, 2023 Maintainer

kthyng Sep 13, 2023

jonmjoyce
Aug 28, 2023

abkfenris
Aug 28, 2023
Maintainer

kthyng
Sep 13, 2023