Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove replica count on worker when hpa enabled #288

Closed
wants to merge 1 commit into from

Conversation

anaselmhamdi
Copy link

@anaselmhamdi anaselmhamdi commented Nov 22, 2023

What

The current hpa parameter in the values.yml doesn't actually do anything for the worker. Worse, when you try to apply your own HPA configuration, the replicaCount blocks the scaling altogether.

I'm not proposing extra config for the hpa, simply the possibility to remove it if you use your own

Relates to this: airbytehq/airbyte#25141

How

Added an if/else removing spec.replicas on the worker deployment if worker.hpa.enabled is true

Recommended reading order

  1. x.java
  2. y.java

Can this PR be safely reverted / rolled back?

If you know that your PR is backwards-compatible and can be simply reverted or rolled back, check the YES box.

Otherwise if your PR has a breaking change, like a database migration for example, check the NO box.

If unsure, leave it blank.

  • YES 💚
  • NO ❌

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

@stevenmurphy12
Copy link

Hi @anaselmhamdi , I'm curious if you took this any further after closing your PR.

I'm just starting to look into performance tuning for my Airbyte deployment. It appears that with the lack of HPA capability, my only option is to over-provision the workers for now.

@alexlopes
Copy link

Same, HPA is a must-have option here. @stevenmurphy12, did you over-provision the capacity since then?

@stevenmurphy12
Copy link

@alexlopes - Yes over-provisioned, though our usage level is still fairly light at the moment. By over-provisioned, I mean just two workers.

My understanding is that an Airbyte worker can handle 5 sync jobs concurrently, though that can be tuned with an environment variable. Our jobs run at different times anyway so I've not done a proper load test of what would happen if 10+ jobs happened to kick off at the same time, presumably they'd get queued in Temporal. A delay is probably acceptable tbh.

@alexlopes
Copy link

@stevenmurphy12 - Got it, this is important now that I'm deploying Airbyte. The data team has some heavy ETL processes based on usage, so I'll likely take the opportunity the opportunity to do some load tests and create a HPA template to target the optimal number of resources. With success I can collaborate here 🤞

@stevenmurphy12
Copy link

stevenmurphy12 commented May 29, 2024

@alexlopes I played with MAX_SYNC_WORKERS, setting it to 1 had the expected effect of running 1 sync at a time and queueing the rest.

For the time being I've moved away from focusing on the worker auto-scaling, instead focusing on the resource requirements of the ephemeral pods being spun up as part of a sync. I plan to run these in their own nodepool with an appropriate node size for their K8s requests/limits.

Something to consider with your

heavy ETL processes

Is whether by heavy you mean a lot of syncs running concurrently, the resource requirements of the syncs being heavy, or both. The source and destination connector definitions have the option of setting K8s resource requests/limits on them (through the config API, not exposed on the UI), though I've not got as far as experimenting with those yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants