Replies: 6 comments 5 replies
-
Fully support both statements:
Pros and cons Decouple scheduling and api:
Migrate to event driven architecture:
We initially thought to go that route, but lended up with current implementation due to some reasons :) one of them is time to implementation. But eventually we need to migrate to that, for sure! |
Beta Was this translation helpful? Give feedback.
-
I agree and the data event oriented is a must |
Beta Was this translation helpful? Give feedback.
-
On structure: I agree in separating into separate apps On architecture: I'm a bit lukewarm here. Do we have the volume to necessitate a move to an event-driven architecture? Assuming for sake of argument that we stick with a polling approach for now, how easy would it be to migrate to an event-driven approach in the future if we did need to make that move? All things being equal, I'd advocate for keeping the implementation as simple to support our workloads. |
Beta Was this translation helpful? Give feedback.
-
oh, definitely not now :) |
Beta Was this translation helpful? Give feedback.
-
This is interesting. When this is released, we can create and watch rayjob CRD instance so that we can get events from kubernetes. |
Beta Was this translation helpful? Give feedback.
-
Another benefit to separating the scheduler and gateway would be easier pod name autocompletes 😆 |
Beta Was this translation helpful? Give feedback.
-
For future reads the context comes from this PR: #570
There are two parts where I would like to see improvements over the current implementation: structure, architecture.
Structure
At this moment, all the logic is fully integrated with the API application. I would like to propose a little refactor around the scheduler, moving it out from the API to a new application called
scheduler
. My main purpose with this change is to separate responsibilities and scopes. From my perspective this will make the code be easier to follow.Architecture
The current implementation has a combination of
django-commands
pluspolling loop
that can be difficult to follow & maintain. To solve this problem we have different approaches. An implementation that I was thinking is based inevent-driven
using a combination ofdjango-signals
pluscelery
. This implementation would have some core keys:Something that I took in consideration when I was analyzing different solutions is that due to the Ray's decentralized scheduler can be tricky to analyze the status of Ray (I leave some references here):
That is one of the main reasons because I couldn't attach the task/event creation to the ray's job finish. I continue investigating this approach though:
I would like to hear opinions from you too: @pacomf , @IceKhan13 , @psschwei , @akihikokuroda 😄
Beta Was this translation helpful? Give feedback.
All reactions