Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create datasetBlockages collection + block datasets #2933

Merged
merged 6 commits into from
Jun 20, 2024

Conversation

severo
Copy link
Collaborator

@severo severo commented Jun 20, 2024

We apply rate limiting on the jobs, based on the total duration in a window (see #2279 (comment)).

Follows #2931

@severo severo changed the title create datasetBlockages collection create datasetBlockages collection + block datasets Jun 20, 2024
@severo severo marked this pull request as ready for review June 20, 2024 17:10
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me, though I'm concerned that it might slow down the next_waiting_job query if there are blocked datasets with thousands of their jobs in the queue (it will have to iterate over them all to filter them and get back a job from an unblocked dataset).

Anyway if it's rhe case it can be fixed by defining a target_start_time that is equal to created_at by default and that can be modified to later values (e.g. now + 1h) if a dataset is blocked

@severo
Copy link
Collaborator Author

severo commented Jun 20, 2024

it might slow down the next_waiting_job query if there are blocked datasets with thousands of their jobs in the queue (it will have to iterate over them all to filter them and get back a job from an unblocked dataset).

yes, I thought the same: I just changed to remove the short jobs (they should not affect too much anyway) + store as int, not float

Copy link
Contributor

@AndreaFrancis AndreaFrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I would suggest creating or adding some tests with memory limits to ensure that it won't affect too much the next_waiting_job function.

Base automatically changed from create-past-jobs-collection to main June 20, 2024 20:24
- store durations as int, not float
- only store durations > 30 seconds
- only check if the dataset should be blocked after long jobs (> 5 min)
- express the rate as MAX_MACHINES (max allowed number of dedicated
  machines)
@severo severo force-pushed the create-blocked-datasets-collection branch from 5c0cea7 to b0cb05f Compare June 20, 2024 20:26
@severo severo force-pushed the create-blocked-datasets-collection branch from b0cb05f to 0506abf Compare June 20, 2024 20:27
@severo
Copy link
Collaborator Author

severo commented Jun 20, 2024

Hmmm I think it will have nearly no impact, since the collection of blocked datasets shouldn't have more than <10 entries, if everything works well. The only "expensive" operation we add is when we compute the sum of the durations (that I optimized a bit with #2933 (comment)), but it's run on job termination, not on job start.

@severo
Copy link
Collaborator Author

severo commented Jun 20, 2024

The CI error is due to issues on the Hub. Merging

@severo severo merged commit e1697d2 into main Jun 20, 2024
21 of 23 checks passed
@severo severo deleted the create-blocked-datasets-collection branch June 20, 2024 20:48
@severo severo mentioned this pull request Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants