feature: hpa for jointinference #458

tangming1996 · 2024-12-16T03:26:27Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
In the scenario of large - model inference, the resource requirements of inference tasks usually increase significantly with the increase in the number of accesses. In the current cloud - edge joint - inference architecture, the fixed single - instance configuration is difficult to effectively cope with such fluctuations, resulting in insufficient resource utilization or performance bottlenecks. By configuring HPA (Horizontal Pod Autoscaler) in the deployment, the number of inference instances can be automatically adjusted according to the real - time number of accesses, and resources can be dynamically expanded or reduced. This mechanism can increase instances during high - load periods and reduce instances during low - load periods, thereby improving concurrent processing capabilities, maximizing the optimization of resource utilization, and ensuring the high efficiency and scalability of the inference service.
Which issue(s) this PR fixes:

Fixes #

Signed-off-by: ming.tang <[email protected]>

kubeedge-bot · 2024-12-16T03:26:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign chendave after the PR has been reviewed.
You can assign the PR to them by writing /assign @chendave in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

feature: hpa for jointinference

ed93801

Signed-off-by: ming.tang <[email protected]>

kubeedge-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 16, 2024

kubeedge-bot requested review from jaypume and MooreZheng December 16, 2024 03:26

kubeedge-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 16, 2024

tangming1996 force-pushed the feature/hpa branch from d9a4d8f to ed93801 Compare December 16, 2024 07:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: hpa for jointinference #458

feature: hpa for jointinference #458

tangming1996 commented Dec 16, 2024

kubeedge-bot commented Dec 16, 2024

feature: hpa for jointinference #458

Are you sure you want to change the base?

feature: hpa for jointinference #458

Conversation

tangming1996 commented Dec 16, 2024

kubeedge-bot commented Dec 16, 2024