Question about dynamic SMs allocation #2

initzhang · 2024-10-25T08:58:02Z

Hello Team,

Thank you for your outstanding work!

I have a question regarding the SM allocation in MuxServe. In your paper, you mentioned:

... parallel runtime dynamically assigns SMs to each job at runtime rather than statically allocating ...

Does this imply that the SM configuration for each process is adjusted on demand during the inference of incoming requests? If that’s the case, could you please explain how this is accomplished? Is it done by setting the CUDA set_active_thread_percentage <PID> <percentage> from time to time for each process?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about dynamic SMs allocation #2

Question about dynamic SMs allocation #2

initzhang commented Oct 25, 2024

Question about dynamic SMs allocation #2

Question about dynamic SMs allocation #2

Comments

initzhang commented Oct 25, 2024