Unstable test timing results #1190

masyagin1998 · 2024-12-07T04:34:52Z

I use judge-server to measure performance of a C++ problem solutions.

Problem consists of 100 similar tests (in fact single input & output files 100 times repeated in YAML) and get quite different results for them while I've done the following:

as I'm running on AWS instance, I took a hardware server and disabled hyper-threading. Also there is no turbo boost for sure;
isolated two processor cores - 1 and 2, and execute all my compiled solutions on 1 core and dmoj (only judge-server) on 2 core by pinning them to cores;
disabled virtual addresses randomization;
set cstate to 1;
also I tried it locally while thought that there are some issues with AWS and via cpufreq-utils set CPU frequency to always be fixed 3600mHz, but still get inconsistent results.

Maybe you have any solutions for this?

Case #1:	AC	[4571µs,	1.52 MB]
Case #2:	AC	[4044µs,	1.52 MB]
Case #3:	AC	[3931µs,	1.52 MB]
Case #4:	AC	[3939µs,	1.52 MB]
Case #5:	AC	[4192µs,	1.52 MB]
Case #6:	AC	[3927µs,	1.52 MB]
Case #7:	AC	[3831µs,	1.52 MB]
Case #8:	AC	[3761µs,	1.52 MB]
Case #9:	AC	[3785µs,	1.52 MB]
Case #10:	AC	[3817µs,	1.52 MB]
Case #11:	AC	[3804µs,	1.52 MB]
Case #12:	AC	[3795µs,	1.52 MB]
Case #13:	AC	[3851µs,	1.52 MB]
Case #14:	AC	[3809µs,	1.52 MB]
Case #15:	AC	[3856µs,	1.52 MB]
Case #16:	AC	[3764µs,	1.52 MB]
Case #17:	AC	[3863µs,	1.52 MB]
Case #18:	AC	[3775µs,	1.52 MB]
Case #19:	AC	[3791µs,	1.52 MB]
Case #20:	AC	[3775µs,	1.52 MB]
Case #21:	AC	[3711µs,	1.52 MB]

The text was updated successfully, but these errors were encountered:

Xyene · 2024-12-07T05:36:51Z

Some thoughts:

Hyper-threading off is a good start.
The fact that cases start off ~4.5ms and end up ~3.8ms in the steady state suggests some sort of caching effect. I would probably not expect page cache, as the binary would have entered the page cache as part of being written out from gcc. It might be the input data itself, since the judge needs to read it and feed it into the submission via a pipe (so maybe the submission occasionally gets starved?) I don't know for sure, but that's the direction I'd suspect on priors once you're sure the system is otherwise configured correctly (see below).

Sandboxing some (but not all) system calls require trapping into the judge process. That'll be anything not marked ALLOW here:

judge-server/dmoj/cptbox/isolate.py

Lines 57 to 60 in f9e3356

    
           self.update( 
        
               { 
        
                   # Deny with report 
        
                   sys_openat: self.handle_openat(dir_reg=0, file_reg=1, flag_reg=2),

+ anything individual executors chose to sanitize (nothing else, for C++). This will introduce jitter; the judge makes no attempt at deterministic sanitization of these system calls. I think this reasonable: file I/O system calls (the majority of that list) mostly occur during runtime initialization. I don't expect C++ to be making many of them (a handful for libc.so, etc.)

The judge already disables ASLR for submissions, so you don't need to do that independently.
Not sure what you mean by your C-state changes. Ideally you want to lock the CPU into C0.
Isolating cores is good, but you probably also want to steer interrupts away from the isolated cores. Check /proc/interrupts. Many drivers to not respect isolcpu settings, and need to have their IRQs moved off manually. Also consider enabling nohz_full to save a 1/ms timer tick interrupt.
If you're on a physical Intel machine and are interested in tracking down the ~4.5ms → ~3.8ms jump, I'd use Intel Processor Trace to trace the full execution of the submission. (You'd have to make a few judge modifications to start the instrumentation at the right places.) As part of my $day_job I maintain a tool that I would personally use if I were investigating this.

Let us know what you find! I'm curious to know what you end up learning.

Xyene · 2024-12-07T05:42:08Z

Also, to add: the input data caching problem would nominally be addressed by #990, but that PR hasn't made it past the finish line.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstable test timing results #1190

Unstable test timing results #1190

masyagin1998 commented Dec 7, 2024 •

edited

Loading

Xyene commented Dec 7, 2024 •

edited

Loading

Xyene commented Dec 7, 2024 •

edited

Loading

Unstable test timing results #1190

Unstable test timing results #1190

Comments

masyagin1998 commented Dec 7, 2024 • edited Loading

Xyene commented Dec 7, 2024 • edited Loading

Xyene commented Dec 7, 2024 • edited Loading

masyagin1998 commented Dec 7, 2024 •

edited

Loading

Xyene commented Dec 7, 2024 •

edited

Loading

Xyene commented Dec 7, 2024 •

edited

Loading