Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable test timing results #1190

Open
masyagin1998 opened this issue Dec 7, 2024 · 2 comments
Open

Unstable test timing results #1190

masyagin1998 opened this issue Dec 7, 2024 · 2 comments

Comments

@masyagin1998
Copy link

masyagin1998 commented Dec 7, 2024

I use judge-server to measure performance of a C++ problem solutions.

Problem consists of 100 similar tests (in fact single input & output files 100 times repeated in YAML) and get quite different results for them while I've done the following:

  • as I'm running on AWS instance, I took a hardware server and disabled hyper-threading. Also there is no turbo boost for sure;

  • isolated two processor cores - 1 and 2, and execute all my compiled solutions on 1 core and dmoj (only judge-server) on 2 core by pinning them to cores;

  • disabled virtual addresses randomization;

  • set cstate to 1;

  • also I tried it locally while thought that there are some issues with AWS and via cpufreq-utils set CPU frequency to always be fixed 3600mHz, but still get inconsistent results.

Maybe you have any solutions for this?

Case #1:	AC	[4571µs,	1.52 MB]
Case #2:	AC	[4044µs,	1.52 MB]
Case #3:	AC	[3931µs,	1.52 MB]
Case #4:	AC	[3939µs,	1.52 MB]
Case #5:	AC	[4192µs,	1.52 MB]
Case #6:	AC	[3927µs,	1.52 MB]
Case #7:	AC	[3831µs,	1.52 MB]
Case #8:	AC	[3761µs,	1.52 MB]
Case #9:	AC	[3785µs,	1.52 MB]
Case #10:	AC	[3817µs,	1.52 MB]
Case #11:	AC	[3804µs,	1.52 MB]
Case #12:	AC	[3795µs,	1.52 MB]
Case #13:	AC	[3851µs,	1.52 MB]
Case #14:	AC	[3809µs,	1.52 MB]
Case #15:	AC	[3856µs,	1.52 MB]
Case #16:	AC	[3764µs,	1.52 MB]
Case #17:	AC	[3863µs,	1.52 MB]
Case #18:	AC	[3775µs,	1.52 MB]
Case #19:	AC	[3791µs,	1.52 MB]
Case #20:	AC	[3775µs,	1.52 MB]
Case #21:	AC	[3711µs,	1.52 MB]
@Xyene
Copy link
Member

Xyene commented Dec 7, 2024

Some thoughts:

  • Hyper-threading off is a good start.

  • The fact that cases start off ~4.5ms and end up ~3.8ms in the steady state suggests some sort of caching effect. I would probably not expect page cache, as the binary would have entered the page cache as part of being written out from gcc. It might be the input data itself, since the judge needs to read it and feed it into the submission via a pipe (so maybe the submission occasionally gets starved?) I don't know for sure, but that's the direction I'd suspect on priors once you're sure the system is otherwise configured correctly (see below).

  • Sandboxing some (but not all) system calls require trapping into the judge process. That'll be anything not marked ALLOW here:

    self.update(
    {
    # Deny with report
    sys_openat: self.handle_openat(dir_reg=0, file_reg=1, flag_reg=2),
    + anything individual executors chose to sanitize (nothing else, for C++). This will introduce jitter; the judge makes no attempt at deterministic sanitization of these system calls. I think this reasonable: file I/O system calls (the majority of that list) mostly occur during runtime initialization. I don't expect C++ to be making many of them (a handful for libc.so, etc.)

  • The judge already disables ASLR for submissions, so you don't need to do that independently.

  • Not sure what you mean by your C-state changes. Ideally you want to lock the CPU into C0.

  • Isolating cores is good, but you probably also want to steer interrupts away from the isolated cores. Check /proc/interrupts. Many drivers to not respect isolcpu settings, and need to have their IRQs moved off manually. Also consider enabling nohz_full to save a 1/ms timer tick interrupt.

  • If you're on a physical Intel machine and are interested in tracking down the ~4.5ms → ~3.8ms jump, I'd use Intel Processor Trace to trace the full execution of the submission. (You'd have to make a few judge modifications to start the instrumentation at the right places.) As part of my $day_job I maintain a tool that I would personally use if I were investigating this.

Let us know what you find! I'm curious to know what you end up learning.

@Xyene
Copy link
Member

Xyene commented Dec 7, 2024

Also, to add: the input data caching problem would nominally be addressed by #990, but that PR hasn't made it past the finish line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants