Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve throughput of utask_main scheduling #4461

Merged
merged 16 commits into from
Nov 28, 2024
Merged

Conversation

jonathanmetzman
Copy link
Collaborator

@jonathanmetzman jonathanmetzman commented Nov 28, 2024

  1. Don't defer the tasks, they've already been deferred.
  2. Improve the scheduling time by 20% by batching calls so that we only need to read YAML and query the database once, per queue read.

vitorguidi and others added 15 commits November 25, 2024 16:20
…g filing (#4415)

### Motivation

As per Chrome request, it is desirable to know how long it takes for an
issue to be opened, from the moment a testcase is created.

Part of #4271
…#4414)

### Motivation

Chrome folks need to know how long on average a fuzzer takes to generate
a testcase. This PR implements that.

Part of #4271
### Motivation

Cumulative distribution metrics from the monitoring initiative were
incorrectly set to use the fixed width bucketer, and/or width=0.05 and
max_buckets=20. This caused percentile metrics to cap at 1, which was
wrong behavior.

This PR attempts to fix that by moving them all to Geometric Bucketer,
without the aforementioned limits. It also reverts #4429 , since it
apparently broke triage.py in chrome.

Part of #4271
Fetching the testcase from the datastore has been wrongly moved before
the call to `_update_testcase`. This leads to us holding an outdated
version of the testcase.
…4437)

Some parts of CF will chdir, breaking relative paths.
### Motivation

Chrome folks want to know what build revision is being used in fuzz
task. This PR implements that.
1. Don't unnecessarily list features about blobs other than names
2. Memoize leak blacklist stuff.
3. Properly skip cleanup.
Previously we tried to handle it by finding the fuzz targets in the
list. But we still failed because it wasn't in the db. Save it to the db
to solve this problem.
Getting credentials is pretty slow since it launches a gcloud process.
We should just cache credentials. Supposedly the library itself handles
refreshes. We are also caching in storage.py so there's proof it's safe.
We want tworkers to skip the function, not non-tworkers.
It's important for speed, probably safe on tworkers, and it will be very
easy to notice if things go wrong since the preprocess queue will pile
up.
This PR also fixes a bad bug where the max_pool_size was not obeyed, but
in practice since this code never runs on machines with more than 2
cores, it's unlikely to matter.
Partial undoes #4430
Make some of the helper functions for batch task creation run on
multiple tasks at once, that way we don't need to parse YAML
or query the database as often.
Also, don't defer utask_mains. Deferring/delaying already happened
in preprocess step.
@jonathanmetzman jonathanmetzman changed the title Schedulespeed Improve throughput of utask_main scheduling Nov 28, 2024
@jonathanmetzman jonathanmetzman merged commit 144bcf1 into oss-fuzz Nov 28, 2024
3 checks passed
@jonathanmetzman jonathanmetzman deleted the schedulespeed branch November 28, 2024 01:48
jonathanmetzman added a commit that referenced this pull request Dec 16, 2024
1. Don't defer the tasks, they've already been deferred.
2. Improve the scheduling time by 20% by batching calls so that we only
need to read YAML and query the database once, per queue read.

---------

Co-authored-by: Vitor Guidi <[email protected]>
Co-authored-by: Ali HIJAZI <[email protected]>
jonathanmetzman added a commit that referenced this pull request Jan 8, 2025
1. Don't defer the tasks, they've already been deferred.
2. Improve the scheduling time by 20% by batching calls so that we only
need to read YAML and query the database once, per queue read.

---------

Co-authored-by: Vitor Guidi <[email protected]>
Co-authored-by: Ali HIJAZI <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants