Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test results more consistently #74

Closed
gurgunday opened this issue May 4, 2024 · 10 comments · Fixed by #256
Closed

Test results more consistently #74

gurgunday opened this issue May 4, 2024 · 10 comments · Fixed by #256

Comments

@gurgunday
Copy link
Contributor

nodejs/performance#166 (comment) suggests that some results are unreliable, if dead code elimination is indeed taking place, we're not getting correct results for the operations

We should investigate and find a way to get results that reflect reality

@RafaelGSS
Copy link
Owner

dead code elimination is taking place. The idea of this repository is to compare two versions of Node.js performing pretty much the same operation. If version X contains a dead-code elimination (noop) and X+1 doesn't, it should inform. The percentage is irrelevant in microbenchmarking. If we include operations to remove dead code elimination (for instance, %DoNotOptimize) it won't give us the reality.

The idea is pretty simple, these operations have changed in Node.js 22 for that specific and unrealistic workload.

@gurgunday
Copy link
Contributor Author

I see, yeah I'm not disagreeing, it actually fulfilled its purpose by showing that change

But I wasn't alluding to %DoNotOptimize, maybe just small modifications to certain tests so that we can get the actual operations' performance, and v8 can still optimize as much as it can, but doesn't eliminate the whole operation

Small modifications can get us there:

nodejs/performance#166 (comment)

@RafaelGSS
Copy link
Owner

I agree! I also suggest moving to another benchmark tool. tinybench seems more reliable in my last research.

@eugene1g
Copy link

eugene1g commented May 4, 2024

The idea of this repository is to compare two versions of Node.js performing pretty much the same operation.

IMO the problem is that when/if dead-code-elimination is at play, the current architecture for benchmarks does not measure the operation. If my application does millions of [].includes(), the benchmark results say that this logic will run ~10x slower in v22 than in v20, but in reality there will be no difference at all as my app code doesn't get eliminated. If the elimination theory is true, then current benchmark results are wrong because the benchmarking code calling [].includes() did not get executed all under v20 but did in v22. Such test behavior tells me nothing about how fast that operation runs in v20 vs v22.

@RafaelGSS
Copy link
Owner

Is there a way to see if the code is reaching dead-code elimination? I think it's fair to consider all code to be optimized by V8 (Maglev/Turbofan). We should only make sure that we are not comparing a noop with a real operation optimized.

cc: @joyeecheung

@joyeecheung
Copy link

joyeecheung commented May 10, 2024

Is there a way to see if the code is reaching dead-code elimination?

With TurboFan could figure it out by doing --trace-turbo and use tools/turbolizer in V8 and look at the graph changes - the elimination is done on basic blocks so it can be difficult to map it back to the original code, depending on how complex your code is. With Maglev I guess some of the --trace-maglev-* flags allows you to figure it out too but I've never tried them myself.

To avoid dead code elimination IMO the best way is just to make your benchmarks more sophisticated. If it's too simple and unrealistic you are basically trying to outsmart the optimizing compilers so that they fail to notice that the code isn't doing anything even though a human can figure out it's not doing anything. But optimizing compilers are designed to be smart enough for this (at least Turbofan is).

Maglev is dumber because it tries to cut corners in optimization under the assumption that most code doesn't need aggressive optimizations, which can be expensive on their own, while compilation speed of said code may already play a more important role in the overall performance. In the real world, a lot of code doesn't actually get hot enough to be worthy of expensive optimizations. Trying to measure how they perform when they are hot may have already been missing the point if the user code doing the pattern isn't hot enough and will only ever be handled by the interpreter or the baseline compiler/mid-tier optimizing compiler, and the compilation of the code takes more time in the application than actually executing said code (which can happen more frequently in CLIs).

@RafaelGSS
Copy link
Owner

Could be fixed by: #256.

@gurgunday
Copy link
Contributor Author

gurgunday commented Sep 27, 2024

Thank you so much!

What do you think of assigning a result of an operation to a global var btw? It seems to fix a lot of the code elimination issues for me when benching

Something like:

let result = "";

const bench = new Bench({ time: 500 });

bench.add("simple HTML formatting", () => {
  result = html`<div>Hello, world!</div>`;
});

bench.add("null and undefined expressions", () => {
  result = html`<p>${null} and ${undefined}</p>`;
});

@RafaelGSS
Copy link
Owner

RafaelGSS commented Sep 27, 2024

It's not guaranteed that it will fix code elimination, other optimizations might take place. I was discussing at v8-mail and a feasible (but not realistic way) is to use %NeverOptimizeFunction for that. bench-node by default, compiles your benchmark using %NeverOptimizeFunction. That's why I said: "_could be fixed by ...". Although, in my tests, it doesn't guarantee a non-optimization as well.

@RafaelGSS
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants