create datasetBlockages collection + block datasets #2933

severo · 2024-06-20T16:12:52Z

We apply rate limiting on the jobs, based on the total duration in a window (see #2279 (comment)).

Follows #2931

lhoestq

Looks all good to me, though I'm concerned that it might slow down the next_waiting_job query if there are blocked datasets with thousands of their jobs in the queue (it will have to iterate over them all to filter them and get back a job from an unblocked dataset).

Anyway if it's rhe case it can be fixed by defining a target_start_time that is equal to created_at by default and that can be modified to later values (e.g. now + 1h) if a dataset is blocked

severo · 2024-06-20T19:01:24Z

it might slow down the next_waiting_job query if there are blocked datasets with thousands of their jobs in the queue (it will have to iterate over them all to filter them and get back a job from an unblocked dataset).

yes, I thought the same: I just changed to remove the short jobs (they should not affect too much anyway) + store as int, not float

AndreaFrancis

LGTM, but I would suggest creating or adding some tests with memory limits to ensure that it won't affect too much the next_waiting_job function.

- store durations as int, not float - only store durations > 30 seconds - only check if the dataset should be blocked after long jobs (> 5 min) - express the rate as MAX_MACHINES (max allowed number of dedicated machines)

…elow JOB_DURATION_MIN_SECONDS

severo · 2024-06-20T20:30:51Z

Hmmm I think it will have nearly no impact, since the collection of blocked datasets shouldn't have more than <10 entries, if everything works well. The only "expensive" operation we add is when we compute the sum of the durations (that I optimized a bit with #2933 (comment)), but it's run on job termination, not on job start.

severo · 2024-06-20T20:48:32Z

The CI error is due to issues on the Hub. Merging

severo requested review from lhoestq and AndreaFrancis June 20, 2024 17:10

severo changed the title ~~create datasetBlockages collection~~ create datasetBlockages collection + block datasets Jun 20, 2024

severo marked this pull request as ready for review June 20, 2024 17:10

lhoestq approved these changes Jun 20, 2024

View reviewed changes

AndreaFrancis approved these changes Jun 20, 2024

View reviewed changes

Base automatically changed from create-past-jobs-collection to main June 20, 2024 20:24

severo added 4 commits June 20, 2024 20:26

create datasetBlockages collection

fd606cc

decide if a dataset should be blocked, and filter it when starting a job

26bee77

simplify

343100f

- store durations as int, not float - only store durations > 30 seconds - only check if the dataset should be blocked after long jobs (> 5 min) - express the rate as MAX_MACHINES (max allowed number of dedicated machines)

fix type

adf049c

severo force-pushed the create-blocked-datasets-collection branch from 5c0cea7 to b0cb05f Compare June 20, 2024 20:26

no more need for negative duration test, since we discard durations b…

0506abf

…elow JOB_DURATION_MIN_SECONDS

severo force-pushed the create-blocked-datasets-collection branch from b0cb05f to 0506abf Compare June 20, 2024 20:27

fix test

ead8856

severo merged commit e1697d2 into main Jun 20, 2024
21 of 23 checks passed

severo deleted the create-blocked-datasets-collection branch June 20, 2024 20:48

severo mentioned this pull request Jun 20, 2024

Rate-limit the updates? #2279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create datasetBlockages collection + block datasets #2933

create datasetBlockages collection + block datasets #2933

severo commented Jun 20, 2024 •

edited

Loading

lhoestq left a comment

severo commented Jun 20, 2024

AndreaFrancis left a comment

severo commented Jun 20, 2024

severo commented Jun 20, 2024

create datasetBlockages collection + block datasets #2933

create datasetBlockages collection + block datasets #2933

Conversation

severo commented Jun 20, 2024 • edited Loading

lhoestq left a comment

Choose a reason for hiding this comment

severo commented Jun 20, 2024

AndreaFrancis left a comment

Choose a reason for hiding this comment

severo commented Jun 20, 2024

severo commented Jun 20, 2024

severo commented Jun 20, 2024 •

edited

Loading