-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] coredump if compiled by g++-11 #7950
Comments
@shuai-xu, thanks for raising this issue! I think we need to clean up the march setting with https://github.com/apache/incubator-gluten/blob/main/cpp/core/CMakeLists.txt#L27 |
@PHILO-HE After removing this line, it does not coredump. |
@shuai-xu, thanks so much for your feedback! As @zhouyuan told me, the newer gcc (e.g., gcc-11) makes full use of native cpu's instruction and optimization when |
@PHILO-HE Let's add -mno-avx512f to gluten cpp compile flags. It's used by Velox as well. It can solve the issue fundamentally. @shuai-xu when you compile gluten and velox using march=native, which means your gcc optimizes binary for the machine you are building Gluten. Not the worker machine. To get best performance, you may set the march=ivybridge or any other machine type for your worker machine. |
@FelixYBW Thank you for explaining, learn a lot. |
But adding |
Backend
VL (Velox)
Bug description
I compile gluten with velox and then run it with spark. There are three machines in the test cluster. I find it always coredump on machine 2 and 3 while running normally on machine 1. The stack is:。Then I compile gluten on another machine, and it runs normally on all the machines. After check the two machines, I find the main diff is the version of g++, I change g++ from g++-11 to g++-10, it works. The machines info is listed in System Info part.
Spark version
Spark-3.3.x
Spark configurations
No response
System information
Compile machine 1:
Compile machine 2:
Run machine 1 is the same as Compile 2.
Run machine 2:
Run machine 3:
Relevant logs
No response
The text was updated successfully, but these errors were encountered: