Test results more consistently #74

gurgunday · 2024-05-04T10:44:09Z

nodejs/performance#166 (comment) suggests that some results are unreliable, if dead code elimination is indeed taking place, we're not getting correct results for the operations

We should investigate and find a way to get results that reflect reality

RafaelGSS · 2024-05-04T14:23:30Z

dead code elimination is taking place. The idea of this repository is to compare two versions of Node.js performing pretty much the same operation. If version X contains a dead-code elimination (noop) and X+1 doesn't, it should inform. The percentage is irrelevant in microbenchmarking. If we include operations to remove dead code elimination (for instance, %DoNotOptimize) it won't give us the reality.

The idea is pretty simple, these operations have changed in Node.js 22 for that specific and unrealistic workload.

gurgunday · 2024-05-04T14:31:57Z

I see, yeah I'm not disagreeing, it actually fulfilled its purpose by showing that change

But I wasn't alluding to %DoNotOptimize, maybe just small modifications to certain tests so that we can get the actual operations' performance, and v8 can still optimize as much as it can, but doesn't eliminate the whole operation

Small modifications can get us there:

nodejs/performance#166 (comment)

RafaelGSS · 2024-05-04T14:35:56Z

I agree! I also suggest moving to another benchmark tool. tinybench seems more reliable in my last research.

eugene1g · 2024-05-04T14:54:20Z

The idea of this repository is to compare two versions of Node.js performing pretty much the same operation.

IMO the problem is that when/if dead-code-elimination is at play, the current architecture for benchmarks does not measure the operation. If my application does millions of [].includes(), the benchmark results say that this logic will run ~10x slower in v22 than in v20, but in reality there will be no difference at all as my app code doesn't get eliminated. If the elimination theory is true, then current benchmark results are wrong because the benchmarking code calling [].includes() did not get executed all under v20 but did in v22. Such test behavior tells me nothing about how fast that operation runs in v20 vs v22.

RafaelGSS · 2024-05-09T20:25:16Z

Is there a way to see if the code is reaching dead-code elimination? I think it's fair to consider all code to be optimized by V8 (Maglev/Turbofan). We should only make sure that we are not comparing a noop with a real operation optimized.

cc: @joyeecheung

joyeecheung · 2024-05-10T09:21:09Z

Is there a way to see if the code is reaching dead-code elimination?

With TurboFan could figure it out by doing --trace-turbo and use tools/turbolizer in V8 and look at the graph changes - the elimination is done on basic blocks so it can be difficult to map it back to the original code, depending on how complex your code is. With Maglev I guess some of the --trace-maglev-* flags allows you to figure it out too but I've never tried them myself.

To avoid dead code elimination IMO the best way is just to make your benchmarks more sophisticated. If it's too simple and unrealistic you are basically trying to outsmart the optimizing compilers so that they fail to notice that the code isn't doing anything even though a human can figure out it's not doing anything. But optimizing compilers are designed to be smart enough for this (at least Turbofan is).

Maglev is dumber because it tries to cut corners in optimization under the assumption that most code doesn't need aggressive optimizations, which can be expensive on their own, while compilation speed of said code may already play a more important role in the overall performance. In the real world, a lot of code doesn't actually get hot enough to be worthy of expensive optimizations. Trying to measure how they perform when they are hot may have already been missing the point if the user code doing the pattern isn't hot enough and will only ever be handled by the interpreter or the baseline compiler/mid-tier optimizing compiler, and the compilation of the code takes more time in the application than actually executing said code (which can happen more frequently in CLIs).

RafaelGSS · 2024-09-27T16:18:19Z

Could be fixed by: #256.

gurgunday · 2024-09-27T16:30:23Z

Thank you so much!

What do you think of assigning a result of an operation to a global var btw? It seems to fix a lot of the code elimination issues for me when benching

Something like:

let result = "";

const bench = new Bench({ time: 500 });

bench.add("simple HTML formatting", () => {
  result = html`<div>Hello, world!</div>`;
});

bench.add("null and undefined expressions", () => {
  result = html`<p>${null} and ${undefined}</p>`;
});

RafaelGSS · 2024-09-27T17:27:30Z

It's not guaranteed that it will fix code elimination, other optimizations might take place. I was discussing at v8-mail and a feasible (but not realistic way) is to use %NeverOptimizeFunction for that. bench-node by default, compiles your benchmark using %NeverOptimizeFunction. That's why I said: "_could be fixed by ...". Although, in my tests, it doesn't guarantee a non-optimization as well.

RafaelGSS · 2024-09-30T23:01:28Z

I'll analyse the results of https://github.com/RafaelGSS/nodejs-bench-operations/commits/main/?before=b70c29c9c19936e25c7a521682ca52af49390f0f+35&after=b70c29c9c19936e25c7a521682ca52af49390f0f and compare with other tools.

RafaelGSS mentioned this issue May 4, 2024

V8 Regression with Node.js 22 nodejs/performance#166

Closed

RafaelGSS mentioned this issue Sep 27, 2024

build: use bench-node over tinybench #256

Merged

RafaelGSS closed this as completed in #256 Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test results more consistently #74

Test results more consistently #74

gurgunday commented May 4, 2024

RafaelGSS commented May 4, 2024

gurgunday commented May 4, 2024

RafaelGSS commented May 4, 2024

eugene1g commented May 4, 2024 •

edited

Loading

RafaelGSS commented May 9, 2024

joyeecheung commented May 10, 2024 •

edited

Loading

RafaelGSS commented Sep 27, 2024

gurgunday commented Sep 27, 2024 •

edited

Loading

RafaelGSS commented Sep 27, 2024 •

edited

Loading

RafaelGSS commented Sep 30, 2024

Test results more consistently #74

Test results more consistently #74

Comments

gurgunday commented May 4, 2024

RafaelGSS commented May 4, 2024

gurgunday commented May 4, 2024

RafaelGSS commented May 4, 2024

eugene1g commented May 4, 2024 • edited Loading

RafaelGSS commented May 9, 2024

joyeecheung commented May 10, 2024 • edited Loading

RafaelGSS commented Sep 27, 2024

gurgunday commented Sep 27, 2024 • edited Loading

RafaelGSS commented Sep 27, 2024 • edited Loading

RafaelGSS commented Sep 30, 2024

eugene1g commented May 4, 2024 •

edited

Loading

joyeecheung commented May 10, 2024 •

edited

Loading

gurgunday commented Sep 27, 2024 •

edited

Loading

RafaelGSS commented Sep 27, 2024 •

edited

Loading