Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: execute auto-scaling in batches #15420

Merged
merged 7 commits into from
Mar 5, 2024
Merged

feat: execute auto-scaling in batches #15420

merged 7 commits into from
Mar 5, 2024

Conversation

shanicky
Copy link
Contributor

@shanicky shanicky commented Mar 4, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR adds 3 parameters for fine-tuning the monitor loop once auto-scaling is enabled.

parallelism_control_batch_size = 10
parallelism_control_trigger_period_sec = 10
parallelism_control_trigger_first_delay_sec = 30

The parallelism_control_batch_size parameter is used to determine how many streaming jobs are processed each time auto-scaling is enabled. When set to 0, it will be configured to handle all jobs.

The parallelism_control_trigger_period_sec parameter determines the period for triggering auto-scaling, with the default set to every 10 seconds.

The parallelism_control_trigger_first_delay_sec parameter is used to determine the delay before the first scale-out to avoid unnecessary conflicts with recovery. It will be deprecated after the stream manager and barrier manager are merged.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

@shanicky shanicky requested a review from yezizp2012 March 4, 2024 14:01
@shanicky shanicky changed the title feat: Automatic scaling supports step expansion. feat: execute auto-scaling in batches Mar 4, 2024
Copy link
Member

@yezizp2012 yezizp2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset LGTM, This PR is to solve the problem that the memory overhead during barrier mutation decode in the auto scaling process may cause cn OOM. See details in #14533 , I'm rising a PR to fix it right now.

src/common/src/config.rs Outdated Show resolved Hide resolved
@shanicky shanicky enabled auto-merge March 5, 2024 09:07
@shanicky shanicky added this pull request to the merge queue Mar 5, 2024
Merged via the queue into main with commit d0ae778 Mar 5, 2024
27 checks passed
@shanicky shanicky deleted the peng/step-scale branch March 5, 2024 10:58
github-actions bot pushed a commit that referenced this pull request Mar 5, 2024
shanicky added a commit that referenced this pull request Mar 7, 2024
shanicky added a commit that referenced this pull request Mar 7, 2024
shanicky added a commit that referenced this pull request Mar 8, 2024
shanicky added a commit that referenced this pull request Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants