-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scala code compiled differently between OpenJDK11 and Semeru Openj9, causing VerifyError #14054
Comments
@ChengJin01 Please take a look at this |
Just talked to @Sidong-Wei offline. The reason why the VerifyError was captured during runtime verification (which is correct as |
Hi @ChengJin01 thanks a lot for your input, I think you made a good point in claiming the issue has nothing to do with the Verifier or JVM as the issue happens bytecode was compiled statically. I further looked into how the Kafka project use java compiler and scala compiler to compile its source code. |
@Sidong-Wei, the java compiler ( The question is whether the |
I double-checked this project at https://github.com/apache/kafka/tree/3.0 and noticed that the [1] The background GradleDaemon: [2] The GradleWrapperMain (command line parser): [3] The Gradle Worker Daemon: By modifying
the issue disappeared (tried many times) and ended up with the correct bytecode:
My understanding is that the scala source must be loaded/compiled directly to bytecode in memory and was most likely optimized in there by JIT before being outputted into the class file but still unclear where the optimization occurred. So we need to get the JIT team involved in the investigation to see how & why this happened in this case. FYI: @0xdaryl |
@0xdaryl can you update this for the 0.30 release? |
JIT hasn't had a chance to look at the failure yet. Thanks @Sidong-Wei for your thorough investigation upon opening the PR. Given the apparent Z-specific nature of the problem, I'll ask @joransiu to assign for JIT investigation based on @ChengJin01's helpful analysis so far. This seems unlikely to be resolved for 0.30. |
I'll move it out. |
I'm wondering if I could get an update on where this issue stands? Is there a plan to implement it in a particular release? |
@joransiu it seems the question is for you. |
Apologies for the delay. I missed the original tag, so unfortuantely, no updates to report. Will have someone look into this from JIT perspective. |
Hey, @Sidong-Wei I am going to be looking in to this failure. What commands did you use to setup and run? curl -kn0OL https://archive.apache.org/dist/kafka/3.0.0/kafka-3.0.0-src.tgz
tar -xf kafka-3.0.0-src.tgz
cd kafka-3.0.0-src/
bin/zookeeper-server-start.sh config/zookeeper.properties
# In a separate console
bin/kafka-server-start.sh config/server.properties |
Hi @VermaSh I can help you with this issue. You can reproduce it from the unit tests which may be easier. It's actually easier to setup with kafka 3.2.0 now which has better support for s390x in the jar dependencies. mkdir src
curl -kn0OL https://archive.apache.org/dist/kafka/3.2.0/kafka-3.2.0-src.tgz
cd src/
tar xf ../kafka-3.2.0-src.tgz
cd kafka-3.2.0-src/
# I'm using IBM Semeru Runtime Certified Edition 11.0.15.0 (build 11.0.15+10)
export JAVA_HOME=/opt/jdk-11.0.15+10
./gradlew jar
# At this point you should be able to run through the quickstart guide but I haven't tried it.
# Several of the tests in the core module will fail. This is one of them:
./gradlew :core:test --tests FetchRequestBetweenDifferentIbpTest.testControllerOldToNewIBP you should see an error that starts with:
Let me know if I can help |
Hi @VermaSh, just wondering if there is any update on this issue? |
Hi @jonathan-albrecht-ibm, Sorry I got swamped with some other work so don't have an update right now. But I'll try to have an update over next few days. |
Quick update: Disabling asynchronous JIT compilation for gradle worker daemon fixes the failures. Doing a binary search now to get the failing JIT method. |
Hi @VermaSh, just wanted to check if you were able to find the failing JIT method. |
Hey @jonathan-albrecht-ibm It fails when I am going through the method log now, so will have an other update for you soon. |
I was able to narrow it down further to class hierarchy table. Running with |
Hi @VermaSh, just wanted to check in and see if you had any new info on this issue. |
Hi @jonathan-albrecht-ibm, unfortunately haven't had a chance to investigate further into the failure. I am hoping to have some free bandwidth later this week as things are cooling down for the other project. |
Haven't have able to make much progress here as I had to investigate a blocker on zLinux. I'll post an update as soon as I make progress for this failure. |
Hi @VermaSh, just wanted to check in and see if you had any new info on this issue. |
Hi @VermaSh, just wanted to check in again on this issue |
@jonathan-albrecht-ibm sorry, haven't had the bandwidth to look into this. |
@Spencer-Comin Can you take a look at the issue given that is is seen with kafka ? For some reason, I can not assign this one to you. |
RE: #14054 (comment) |
Hi @Spencer-Comin, just wanted to check if you are planning on working on this issue. It's currently unassigned. |
@jonathan-albrecht-ibm I’m off for the next two weeks; once I’m back I’ll continue working on this issue. |
I've poked around at this for the last few weeks. The issue disappears when I use Semeru 11.0.22+7. I've confirmed that the correct bytecode is produced.
Full disassembly here: AbstractIndex.11.0.22+7.disassembly.txt |
Thanks @Spencer-Comin that looks good to me. I'm okay to close this issue now. |
@Spencer-Comin, just wanted to check if you would like to close this issue. I don't have permission to do that. |
Issue Number: 14054 |
Issue Number: 14054 |
I thought assigning myself and reclosing the issue would let the bot know that I was the actual assignee, but apparently not |
Feel free to ignore comments such as #14054 (comment). They are an AI experiment and mean nothing. |
Java -version output
openjdk version "11.0.13" 2021-10-19
IBM Semeru Runtime Open Edition 11.0.13.0 (build 11.0.13+8)
Eclipse OpenJ9 VM 11.0.13.0 (build openj9-0.29.0, JRE 11 Linux s390x-64-Bit Compressed References 20211021_227 (JIT enabled, AOT enabled)
OpenJ9 - e1e72c4
OMR - 299b6a2d2
JCL - 2d83aa3b76 based on jdk-11.0.13+8)
Summary of problem
I build Apache Kafka project version 3.0.0 with Semeru JDK 11.0.13 on a s390x machine, and when I started the service I found the following Verify error:
The same problem is not observed with Temurin JDK 11.0.13 or with Semeru JDK 11.0.13 on a amd64 machine.
Diagnostic files
As the error message suggested, I examine the compiled Java bytecode on different environment. I found that the issue lies within this source file. The compiled code of this file is different among different JDKs and archs while other files are consistent. Here is part of the difference:
Semeru JDK 11.0.13 on s390x:
Attaching the full text here: semeru_disassembled.txt
Temurin JDK 11.0.13 on s390x (Semeru JDK 11.0.13 on amd64 is similar):
Attaching the full text here: temurin_disassembled.txt
From the comparison, we could tell that the local variable no. 9 on Temurin machine is missing on Semeru machine. There are two operations involving this variable
249: astore 9
and274: aload 9
, whereas the latter one was mistakenly compiled to273: aload_0
on Semeru s390x machine, which leading to the next operation274: athrow
to throw the class object itself (while it actually is supposed to throw local variable 9). This assumption matches what the error message is saying (kafka/log/AbstractIndex
is not throwable as it was not supposed to be thrown).I compared the source code with the compiled java bytecode, and I believe the snippet above is compiled from the following source file block (line 109 - 138 from file):
Apparently, there are some bugs when compiling the code block above with Semeru on s390x, while the bugs are not present when compile on AMD64 or using Temurin JDK. The most obvious sign of the bug is the missing
9
local variable, which unfortunately is not shown on the local variable table as well. I am guessing the9
variable may relate to thetry
block as it is supposed to be thrown at a certain point, but I cannot get further information on that, and I also could not get a simpler reproduction test case as the issue seems to be pretty subtle and tricky.Another thing that might help with debugging this issue is the scala compilation options. In the environment above, I have been using Kafka's default scala compilation flags, which is:
And I actually found that if I turn off the inliner optimization of Scala compiler, it will leads to different Java machine code but the issue will not persist on s390x with Semeru JDK. Please see this attached file:semeru_no_inliner_disassembled.txt
As mentioned, turning off inliner will lead to a different bytecode file but for this specific class
kafka/log/AbstractIndex
, the local variable9
and the two corresponding operation will be correctly compiled, thus the VerifyError will not emerge. So it seems the bug only happens on s390x and while the inliner is turned on.It will be really appreciated if someone could look into this error and find out why one local variable will be missing when compiling scala code with Semeru JDK11. Thank you.
The text was updated successfully, but these errors were encountered: