-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openlibm performance on ARM server is very poor #282
Comments
Some important information is missing. What operating system? What compiler/toolchain? What happens if -fno-gnu89-inline is removed from the command line? If you're using gcc, what happens if you use -march=native -mtune=native? Speed isn't everything. Have you checked accuracy? |
Ubuntu 22.04, gcc 11.4.0, running on a Neoverse V1 server. Removing fno-gnu89-inline made no difference, and specifying mtune and march also had no affect. I have run the test suite and everything passed. I also went into the test makefile and re-enabled building of test-float-system and test-double-system, and the following was produced:
I was however able to get slightly better results when compiling with clang 14.0:
If you would like me to run further tests, I would be more than happy to do so. Thanks! |
yes: use the CORE-MATH code, which has efficiency comparable to GNU libc (see https://core-math.gitlabpages.inria.fr/64.pdf) and delivers correct rounding. |
Unfortunately, I cannot help at the source code level as I do not have access to an ARM server. There are newer versions of GCC and the release notes show new arm processors have been added as well as changes to code generation since 11.4.0 was released. You may want to try an updated GCC. In your results, I would look at |
@zimmermann6 Unfortunately core-math compiles with errors on arm as it uses x86 intrinsics. We may be able to get around this by using a library to convert the calls into their corresponding NEON instructions, but I would worry that there could be some subtle differences that would throw off the results. Some tests also fail on arm, for example:
Building with GCC 12.3.0 (which is the latest version in my package manager), the performance is either the same as it was in 11.4.0, or even slightly slower. |
Hi @jmather-sesi I can reproduce on cfarm117, I will investigate. |
this issue is fixed. For the record, it was due to a different conversion from the double value 0x1p64 to int64_t. I suggest we followup on core-math issues on the core-math mailing list. |
I see very poor results on a modern ARM server. Some openlibm implementations are up to 48.68x slower than their libm counterparts. This was a checkout of the main branch at 12f5ffc This is also related to #234, but the performance difference seems to be even more dramatic.
For interest and potential usefulness to #203, I also compared it against an optimized build of musl 1.2.4:
GNU libc version: 2.35
GNU libc release: stable
The openlibm compilation line looks like:
I have tried compiling openlibm with just bare
make
, and also specifying the architecture directly withmake ARCH=aarch64
to identical results.Is there something we can do about this?
The text was updated successfully, but these errors were encountered: