-
Notifications
You must be signed in to change notification settings - Fork 178
State of Data Cache Optimizations #178
Comments
Hi Andrey, Really happy that you see the performance gains from using BOLT and even happier that you are willing to improve BOLT with your current and upcoming contributions! Data cache optimizations have been on our radar for quite some time. We've started with jump tables placement optimization ( The gains from static data reordering will be much dependent on the application. We haven't measured them on any open-source benchmark, hence the lack of data in our publications. We did see a decrease in D$ misses with jump tables splitting for some of our workloads. For real-world estimation, you will have to look at D$/TLB miss rates and their distribution in the address space. With special counters ( In the near future, we'll be looking into static data reordering in both LLD and BOLT. We will need additional support from the linker for the feature to be usable in BOLT. And compiling with For making dynamic memory allocations cache-friendly, you'll need more than just BOLT. E.g., HALO paper (https://dl.acm.org/doi/10.1145/3368826.3377914) uses BOLT and additionally has to modify heap allocators. The following RFC from Google uses sanitizer technique for profiling heap allocations: https://groups.google.com/g/llvm-dev/c/0PN-rBV9WAs/m/MfF8OmJIAQAJ. Cheers, |
Hi Maksim, Thank you for the quick and informative reply!
That's encouraging to know! The biggest question mark for us is whatever this direction is worthy to invest our limited resources at all.
Makes total sense. Will do.
Cool! Please keep us in the loop; we might contribute -- at least with testing on a different corpus of applications; most likely with code as well.
Thanks for these pointers! Yours, |
Hi Maksim, One more thing, if you don't mind.
Could you, please, elaborate a bit on what exactly you need from the linker/compiler? My naive understanding is that BOLT can move statically allocated data as it pleases, no? Yours, |
Hi Andrey,
In the compiler, we would like to enable Suppose you have a local object that is referenced with some offset Cheers, |
Hi Maksim, Thanks! -- it's clear now.
I agree that relying on debug info is a non-starter. If anything, I doubt all the optimizations preserve it fully correctly. Yours, |
Hi BOLT Team,
First of all, thank you again for creating such a wonderful tool! -- we (Huawei) are getting very real performance gains thanks to BOLT. Also, as you probably noticed, we are filling remaining gaps and contribute our changes back to upstream. I hope that the rate of our contributions will only increase in the near future.
Next frontier for us is data cache optimizations. Apparently, BOLT already has some support (options starting from -reorder-data), but it looks like it's not as mature as code cache optimizations.
Could you, please, elaborate a bit more on the current state of data cache-directed optimizations? Specifically:
We may start some development in this direction, and obviously, we don't want to simply repeat what you already covered.
Thank you!
Yours,
Andrey
===
Advanced Software Technology Lab
Huawei
The text was updated successfully, but these errors were encountered: