You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 28, 2024. It is now read-only.
Thank you for the great package. I'm interested in hosting an LLM on GKE.
For our existing ML applications, we usually implement a queue-worker system (e.g. redis-queue or redis-celery) to handle long-running background tasks. Does ray-llm have a similar feature implemented under-the-hood? Or do I need to set it up myself?
The text was updated successfully, but these errors were encountered:
@sihanwang41 Thank you for your reply. I saw there's a RFC related to integration of queuing system in Ray serve: ray-project/ray#32292. So I was wondering if that's something Ray-LLM would consider, especially given the inference of LLM usually takes pretty long to run.
In the meantime, we can set up the queuing system ourselves.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Thank you for the great package. I'm interested in hosting an LLM on GKE.
For our existing ML applications, we usually implement a queue-worker system (e.g. redis-queue or redis-celery) to handle long-running background tasks. Does ray-llm have a similar feature implemented under-the-hood? Or do I need to set it up myself?
The text was updated successfully, but these errors were encountered: