Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dicussion: admission control done in Kernel? #7347

Closed
lmatz opened this issue Jan 12, 2023 · 4 comments
Closed

Dicussion: admission control done in Kernel? #7347

lmatz opened this issue Jan 12, 2023 · 4 comments

Comments

@lmatz
Copy link
Contributor

lmatz commented Jan 12, 2023

When the compute node is already under pressure, i.e. high CPU usage and high memory usage because the workload is intensive or too many queries already running concurrently,

it makes little sense to allow new queries to be scheduled on this compute node. (just return a warning or disallow but leave a backdoor to schedule jobs onto it anyway, e.g. user insists, for stress testing, etc.)

Some option:

  1. CPU/Memory usage has reached a threshold for XX minutes, then when issuing create materialized view to RisingWave, it returns a notice.
  2. compare the number of CPUs on the compute node and the parallelism taken by all the jobs in total, although a job may not fully utilize the resources it is assigned to.
  3. ......

I think for Cloud, eventually, this is a must-do.
But it is a question of what kind of input the admission control wants from the kernel to work well.
Also, the nicer way I suppose, is to alert the users in advance and be proactive instead of returning the warning/error message only after users try to schedule a new query

@github-actions github-actions bot added this to the release-0.1.16 milestone Jan 12, 2023
@lmatz
Copy link
Contributor Author

lmatz commented Jan 12, 2023

I checked the MRD that there are three requirements:

  1. Provide users with job admission control using prioritization at workload or use case level
  2. Provide configuration rules to establish workload admission and execution controls to better optimize performance in meeting customer SLAs
  3. Provide users with job admission control using cost-based prioritization at tenant level

But I just feel none of these features should be built into the Kernel?

@neverchanje
Copy link
Contributor

neverchanje commented Jan 20, 2023

Thanks for raising this question. I suggest we can do some research on how mature products in the industry do for resource isolation and management. To my knowledge, every system deployed on-prem requires careful resource arrangement so that some jobs can get prioritized over the others. But such a strategy is never perfect. In my experience, the users (who have to be aware of these resource issues) have to set the correct expectation (SLA) for their jobs. It requires a top-down alignment across the whole company to know which departments are the high priority.

@lmatz lmatz removed this from the release-0.1.17 milestone Feb 6, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Apr 8, 2023

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@lmatz
Copy link
Contributor Author

lmatz commented Sep 24, 2024

solved by #18383

@lmatz lmatz closed this as completed Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants