Bind lambda puller to batch hardware specs #37

nolan1999 · 2023-09-13T12:22:28Z

~~There is a bit of confusion as to what "Hardware" means at the moment:~~

the user can specify the hardware they need, so as to filter the workers

version_config defines how much hardware a version needs, as specs for the AWS job definition

the AWS job puller's config also has hardware specs, to pull jobs. This is kind of a duplicate of the ones in version_config...~~

Maybe we should also have default hardware per version.

~~Note that memory can be overridden https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/batch/client/submit_job.html when submitting a job.~~

Things as they are now:

The user might specify hardware requirements
The batch jobs (cloud workers) get requirements from the user (if defined) or from the version_config (if not defined)
Local workers hope to have enough, if the users did not specify anything
GPU model, architecture, memory are not meaningfully used (string match when pulling)

Should do:

If the user does not specify any requirements, the defaults from version_config should be used
Do the lambdas even need the version_config then? Check, but apparently only HW and timeout gotten: HW solved by above, timeout could be passed as well (and specifiable by user, and filterable, and sortable)
The batch puller pulls based on the maximum they can offer; based on the requirements, they override the job definition defaults with the right instance type, memory, and vcpu (which does not necessarily match cpu_cores....)
For this, the batch environment needs a mapper instance type => (max cpu cores, gpu model, gpu architecture, gpu memory), and a function that based on the requirements returns the cheapest instance type to use
Local workers work as they do now (the cluster might need a similar mapping as well)

nolan1999 · 2023-09-13T12:45:25Z

Also, do we want to have multiple types of GPUs running, i.e., multiple job definitions (and pulling) in AWS Batch?
At submit_job, can specify instance to use, ...

nolan1999 · 2023-12-23T00:37:38Z

~~Will need somehow mapper from gpu_archi to instances...~~

Haydnspass changed the title ~~Hardware specs~~ Bind lambda puller to batch hardware specs Sep 15, 2023

nolan1999 assigned Haydnspass and nolan1999 Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bind lambda puller to batch hardware specs #37

Bind lambda puller to batch hardware specs #37

nolan1999 commented Sep 13, 2023 •

edited

Loading

nolan1999 commented Sep 13, 2023 •

edited

Loading

nolan1999 commented Dec 23, 2023 •

edited

Loading

Bind lambda puller to batch hardware specs #37

Bind lambda puller to batch hardware specs #37

Comments

nolan1999 commented Sep 13, 2023 • edited Loading

nolan1999 commented Sep 13, 2023 • edited Loading

nolan1999 commented Dec 23, 2023 • edited Loading

nolan1999 commented Sep 13, 2023 •

edited

Loading

nolan1999 commented Sep 13, 2023 •

edited

Loading

nolan1999 commented Dec 23, 2023 •

edited

Loading