Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(KONFLUX-5670): Define computeResources for prefetch-dependencies* tasks #1763

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jhutar
Copy link
Contributor

@jhutar jhutar commented Dec 17, 2024

OK, I was trying to reproduce the issue KONFLUX-5951 and was able to do so with Gomod Cachi2 configured. I do not know what caused that issue, but noticed that for me all of 50 concurrent prefetch-dependencies pods were running on the same node!

I have picked 1 CPU and 3 GiB of memory as a request (and in an effort to slowly migrate to model where we have requests == limits), used the same for limits. This is why I used these numbers (graphs from stone-prd-rh01):

image
... looks like steps step-prefetch-dependencies and step-create-trusted-artifact can take about 2 GB (including cache and so)

image
... RSS memory is of-course smaller

image
... regarding CPU it takes about 1 CPU across all containers

image
... again, only containers step-prefetch-dependencies and step-create-trusted-artifact seems to be bigger players here

@hugares
Copy link

hugares commented Dec 17, 2024

the failing check is saying you need to generate some files:

Error: File is out of date, run `hack/generate-ta-tasks.sh` and include the updated file with your changes

@hugares
Copy link

hugares commented Dec 17, 2024

the failing check is saying you need to generate some files:

Error: File is out of date, run `hack/generate-ta-tasks.sh` and include the updated file with your changes

I think this is due to the fact that -ta version does not have same values as non -ta one

@jhutar jhutar force-pushed the fix11-prefetch-resources branch from 890ab47 to 52db40e Compare December 18, 2024 08:20
@jhutar
Copy link
Contributor Author

jhutar commented Dec 18, 2024

Hello @hugares . At the end I had to do some changes in task-generator/trusted-artifacts/ta.go. When I just added ComputeResources to that step in task-generator/trusted-artifacts/golden/prefetch-dependencies/ta.yaml, it did not helped (I was not getting ComputeResources definition in resulting TA task yaml).

Hello @zregvart. Is this correct approach please?

@jhutar jhutar force-pushed the fix11-prefetch-resources branch from 52db40e to bce5491 Compare December 18, 2024 09:58
@jhutar
Copy link
Contributor Author

jhutar commented Dec 18, 2024

For record, this is a failure rate of prefetch-dependencies.* tasks in stone-prd-rh01 in last week and day:
image
image
And average over time:
image
image

@jhutar jhutar force-pushed the fix11-prefetch-resources branch from bce5491 to 651c76f Compare December 18, 2024 11:08
@jhutar jhutar force-pushed the fix11-prefetch-resources branch from 651c76f to faa9fcb Compare December 19, 2024 06:53
@jhutar
Copy link
Contributor Author

jhutar commented Dec 19, 2024

Hello @zregvart. Is this correct approach please?

OK, ignore me Zoran. Looks like yaml in task-generator/trusted-artifacts/golden/prefetch-dependencies/ta.yaml is only used in tests, so changing it will not help with generating yamls, but will help with go test run in CI checks.

Copy link

@hugares hugares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

@chmeliik chmeliik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this based based on tests with one specific repo, or do you have data across varied uses of git-clone and prefetch-dependencies?

The memory and cpu usage will vary wildly depending on the size of the repo, the configured package managers, the number of dependencies etc.

Comment on lines +338 to +347
ComputeResources: core.ResourceRequirements{
Requests: core.ResourceList{
core.ResourceCPU: resource.MustParse("1"),
core.ResourceMemory: resource.MustParse("3Gi"),
},
Limits: core.ResourceList{
core.ResourceCPU: resource.MustParse("1"),
core.ResourceMemory: resource.MustParse("3Gi"),
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the values here would be part of the recipe, nothing that would block this change, this can be added if/when there is a need to modify this in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree (as far as I understand the architecture here) but that would require bigger code change and I did not wanted to go too deep to it for now. But, let me know and I will happily work on it in different PR.

@jhutar
Copy link
Contributor Author

jhutar commented Dec 20, 2024

Hello @chmeliik ! Graphs come from real world builds on stone-prd-rh01.

If the concern is limits are too small, I think they should be OK until some new huge repo appears (I have pasted a graph for whole week of data).

If the concern is the requests (and limits) are too high for small repos without specific requirements, I assume that should be fine, because in that case the pod/task will be running just briefly, so resources will not be allocated for too long.

@chmeliik
Copy link
Contributor

Hello @chmeliik ! Graphs come from real world builds on stone-prd-rh01.

Ack 👍

If the concern is limits are too small, I think they should be OK until some new huge repo appears (I have pasted a graph for whole week of data).

Yeah this was it. Ok, let's see if the current requests and limits are enough

Copy link
Contributor

@chmeliik chmeliik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could you just update the commit message or split into two commits? This PR updates the computeResources for every create-trusted-artifact step and for one separate step in the prefetch-dependencies task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants