-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documenting best practises for resource allocation when pre-warming a hub for an event #1594
Comments
Also sharing @consideRatio's answer from slack too for more context: " Decisions on what nodes to use could reasonably be delegated, but we can come with recommendations as well.
Goals:
Overall, very complicated topic. One can optimize for so many things and depending on resource request and expected user activity, one may opt for very different things in hardware. |
So I'm setting up a hub for an event in #2049, but I think the hub experience can be improved by using larger nodes that users share instead of allocating individual nodes for users. I think we should provide recommended setups and clarify the benefits for them of sharing a few larger nodes. Less cost, better UX with regards to the time it takes to startup etc. |
I opened #2121 which I think relates greatly to this. |
I think of this as to a large extent blocked by #3030, and that we also need a feature to not always optimize to use the smallest available node for a resource allocation request. I opened #3293 to track that. I think the information we need from community reps is the extpected amount of users and what resource allocation requests they plan to use - that would allow us to optimize for the event quite well. |
Thanks @consideRatio! ✨
I believe hub share options during every day usage vs share options during events tend differ. Specifically for events, we should have a written policy that we apply or a set of guidelines that we check, to decide if we need to make adjustments to the infrastructure. So, I don't think we should block writing these docs/guidelines about events, on creating an utility that would guide us how to setup the choices for everyday usage based on a chosen strategy, which is what I understand #3030 is trying to achieve. Also, I believe that as much as we are trying to make #3030 perfect and cover all cases and strategies, as hard it will be to move it forward, and until then we are in a weird state where:
Yes, I've noticed community reps actually using the template in https://docs.2i2c.org/community/events/#notify-the-2i2c-team-about-the-event, which I think is great, but it means we should make sure we keep that info updated (this is the place that I am thinking when I say communty-facing docs" |
Ah, I've just noticed #3293! That makes sense. Motivation would be:
I really believe that having such docs will reduce stress, fatigue and load on the engineer team. |
Update 2023
We are now using node sharing for a more effective resource allocation. See #2121 for more details.
With this resource allocation, we can empower admins to pre-warm the hubs by themselves before an event, by carefully choosing the machine types (depending on their expected usage and number of users).
Specifically in the context of an event, we should document current resource allocation practices:
Context
I believe it would be super useful to define some 2i2c-specific best practices for when we're pre-warming the hubs for events and how we're supposed to be choosing:
From what I'm seeing from last year events it looks like we used to follow an approach of <10 pods per node and a high number for the autoscaler. But I feel like recently I've seen recommendations to use more powerful machines that fit more pods and fewer nodes in the nodepool so that cpu and mem are more efficiently used.
Proposal
I know there are pros and cons to each of these approaches, but what are some key factors that might impose one approach instead of the other? Or, more specifically, what could be the questions we could ask the communities about their workflow in order to make a more informed decision?
Updates and actions
The text was updated successfully, but these errors were encountered: