Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Kubernetes Workflow Resources #121

Merged
merged 12 commits into from
Oct 6, 2023
Merged

Conversation

teroyks
Copy link
Contributor

@teroyks teroyks commented Sep 28, 2023

Related to https://github.com/valohai/meta/issues/275

Add a resources property to Step for defining Kubernetes resources.

All resources (cpu, memory, devices) are required if resources is given.

Example step definition

- step:
    [...]
    resources:
      cpu:
        min: 0.1
        max: 1
      memory:
        min: 50
        max: 100
      devices:
        nvidia.com/gpu: 1
        nvidia.com/cpu: 2

Testing

  • examples/step-with-resources.yaml – contains all resource definition properties
  • examples/step-with-partial-resources.yaml – an example of defining only some of the properties
  • error_examples/step-invalid-resources.yaml – an example of invalid resources that cause a validation error
  • test_workload_resources.py – unit tests for the WorkloadResources object

Open questions

  • Ok to require all resources, or no resources at all?
    • do not require all resources, properties may be omitted
  • Resources now available to all step definitions – is this ok?
    • yes

@teroyks
Copy link
Contributor Author

teroyks commented Sep 28, 2023

@tomi @ruksi Should the resources be marked as Kubernetes-related somehow?

@codecov-commenter
Copy link

codecov-commenter commented Sep 28, 2023

Codecov Report

Merging #121 (14b70b9) into master (aa2d92e) will increase coverage by 0.08%.
The diff coverage is 94.59%.

@@            Coverage Diff             @@
##           master     #121      +/-   ##
==========================================
+ Coverage   92.31%   92.40%   +0.08%     
==========================================
  Files          61       63       +2     
  Lines        1914     1988      +74     
  Branches      326      332       +6     
==========================================
+ Hits         1767     1837      +70     
- Misses         71       75       +4     
  Partials       76       76              
Files Coverage Δ
tests/conftest.py 100.00% <100.00%> (ø)
tests/test_workload_resources.py 100.00% <100.00%> (ø)
valohai_yaml/objs/step.py 97.43% <100.00%> (+0.06%) ⬆️
valohai_yaml/objs/workload_resources.py 89.74% <89.74%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ruksi
Copy link
Member

ruksi commented Sep 28, 2023

@tomi @ruksi Should the resources be marked as Kubernetes-related somehow?

I wouldn't mark them as Kubernetes specific, AFAIR we've had other cases where e.g. limiting the resources on executions has been on the table (e.g. a shared instance) and thus e.g. memory.max could also be utilized there 🤔

@ruksi
Copy link
Member

ruksi commented Sep 29, 2023

OK to require all resources, or no resources at all?

I think it's OK to define partial resource definitions, then we would fill with the rest with defaults (static in roi or some future default definitions in environments), it that is what you meant?

Having no resource definitions is fine

Resources now available to all step definitions – is this ok?

I think they are fine to be able to be used in all step definitions~

Did you have something in mind where we would need to limit their availability?

@teroyks
Copy link
Contributor Author

teroyks commented Sep 29, 2023

Did you have something in mind where we would need to limit their availability?

No, just wanted to make sure there wasn’t some restriction I was unaware of.

@teroyks teroyks force-pushed the tero/feature/workflow-resources branch from fb1122e to fc07c16 Compare October 2, 2023 11:31
Initial commit to check progress -- will be amended later.

Add workflow resources to the step YAML configuration.
Add resources to the parser -> create serializable objects.
If the `resources` key is defined, it must contain all the defined properties.
You can omit any resource (cpu, memory, devices) and their properties as
long as the YAML is valid.
Filename reflected old class name
Workload properties (and sub-properties) are optional; make sure mising
properties have a value of None on every level.
@teroyks teroyks marked this pull request as ready for review October 4, 2023 11:55
@teroyks
Copy link
Contributor Author

teroyks commented Oct 4, 2023

Will tidy up the commit history before merging.

@teroyks teroyks requested a review from ruksi October 4, 2023 11:55
@teroyks teroyks self-assigned this Oct 4, 2023
Copy link
Member

@ruksi ruksi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new objects do not implement the established Item interface, see comments

valohai_yaml/objs/workload_resources.py Outdated Show resolved Hide resolved
valohai_yaml/objs/workload_resources.py Outdated Show resolved Hide resolved
@teroyks teroyks marked this pull request as draft October 5, 2023 09:04
@teroyks teroyks force-pushed the tero/feature/workflow-resources branch from fc07c16 to e750834 Compare October 5, 2023 09:08
Due to some pending changes in the typing, the subresources need to be
introduced first in the module.

This commit is just to help review the changes made to the classes in
the next commit (Git is not good with reorganization and changes done in
the same commit).
Non-idiomatic use of Item classes changed to use the `parse` method for
constructing the object, and type parameters for the classes (instead of
dicts).
@teroyks teroyks marked this pull request as ready for review October 5, 2023 14:10
@teroyks teroyks requested a review from ruksi October 5, 2023 14:10
Copy link
Member

@ruksi ruksi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments + as mentioned, types might need a refresh

valohai_yaml/objs/step.py Outdated Show resolved Hide resolved
valohai_yaml/objs/workload_resources.py Outdated Show resolved Hide resolved
valohai_yaml/objs/workload_resources.py Outdated Show resolved Hide resolved
teroyks added a commit that referenced this pull request Oct 6, 2023
Re: review comment at
#121 (comment)

Overriding the `parse` method not needded for cpu and memory where the properties are known (the parameter name just needs to match the YAML property).
teroyks added a commit that referenced this pull request Oct 6, 2023
Correctly define workflow resources parameter type [1]
- fixing parsing makes this just work

Use SerializedDict as device resources init type [2]
- passed from parse

[1] #121 (comment)
[2] #121 (comment)
Re: review comment at
#121 (comment)

Overriding the `parse` method not needded for cpu and memory where the properties are known (the parameter name just needs to match the YAML property).
Correctly define workflow resources parameter type [1]
- fixing parsing makes this just work

Use SerializedDict as device resources init type [2]
- passed from parse

[1] #121 (comment)
[2] #121 (comment)
@teroyks teroyks force-pushed the tero/feature/workflow-resources branch from 0f85766 to 14b70b9 Compare October 6, 2023 08:29
@teroyks teroyks requested a review from ruksi October 6, 2023 08:30
Copy link
Member

@ruksi ruksi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good now, rebase and we can go ahead! 💯

valohai_yaml/objs/workload_resources.py Show resolved Hide resolved
@teroyks teroyks merged commit d088e66 into master Oct 6, 2023
6 checks passed
@teroyks teroyks deleted the tero/feature/workflow-resources branch October 6, 2023 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants