Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanner should stop queueing work #582

Closed
meejah opened this issue Nov 2, 2021 · 4 comments
Closed

Scanner should stop queueing work #582

meejah opened this issue Nov 2, 2021 · 4 comments
Labels
enhancement New feature or request performance

Comments

@meejah
Copy link
Collaborator

meejah commented Nov 2, 2021

If "a lot" of files are added at once, the scanner should have some sort of high-water mark where it stops adding new files for some time (e.g. until a low-water mark is reached).

Adding ~10000 new files at once will cause a lot of uploads to get queued, consuming CPU and memory.

@meejah meejah added the enhancement New feature or request label Nov 2, 2021
@meejah
Copy link
Collaborator Author

meejah commented Nov 3, 2021

In concrete terms, adding 2271 file and doing an explicit scan took 40s (which includes the time to create and serialize LocalSnapshot data in the state database as well).

Adding 8240 files at once took 3m, 25s.

@meejah
Copy link
Collaborator Author

meejah commented Jan 25, 2022

Since this is now using cooperator it doesn't do infinite work at once.
As it's tagged "performance" we need to measure first before further work is very useful.

In any case it seems a lot of the slowdown is from producing JSON for either the status API or Eliot logs .. adding the 2409 files in my Twisted checkout produced 354MiB of Eliot logs consisting of 1.7M lines of JSON.

@meejah meejah closed this as completed Mar 1, 2022
@hacklschorsch
Copy link

2409 files leading to 1.7M lines of JSON - i.e. in the order of one thousand log lines per added file 🙀 Might be a good ticket to have, no?

@meejah
Copy link
Collaborator Author

meejah commented Oct 25, 2022

I think #632 counts ... it's not certain that it definitely is the "too much JSON" or whatever, and we'd need a performance-test to know if we got better or not.

(I presume that writing some of the performance tests suggested in the above ticket would reveal problems like the one in the comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

2 participants