Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang when compilation fails after certain types of invalidation #20750

Closed
dsouzai opened this issue Dec 4, 2024 · 5 comments · Fixed by #20763
Closed

Hang when compilation fails after certain types of invalidation #20750

dsouzai opened this issue Dec 4, 2024 · 5 comments · Fixed by #20763

Comments

@dsouzai
Copy link
Contributor

dsouzai commented Dec 4, 2024

It is possible for a hang to occur when a compilation consistently fails after a method is invalidated in specific circumstances. This does not apply to invalidation due to HCR. At present there are two situations when this can occur:

  1. invalidation due to prex
  2. invalidation when -Xnojit -Xnoaot is specified post-restore; this is especially noticeable under -XX:+DebugOnRestore (see cmdLineTester_criu_jitPostRestore Test -Xnojit -Xnoaot hang #20663).

The sequence of events is as follows:

  1. A method (foo) gets compiled.
  2. foo gets recompiled. This patches the startPC of the old body to jump to a helper that forwards execution to the new body.
  3. foo gets invalidated (because of prex or -Xnojit -Xnoaot post-restore), resulting in the startPC of the recompiled foo getting patched to trigger a sync recompilation
  4. The sync recompilation fails; this results in the startPC of the recompiled foo (from step 2) getting patched to revert to the interpreter.
  5. Once in the interpreter, the extra field of the J9Method of foo gets reset to 0x1 (J9_STARTPC_NOT_TRANSLATED)
  6. On the next (interpreted) invocation of foo, the interpreter triggers a compilation request (because the count is technically 0)
  7. This compilation also fails, which results in the extra getting set to -3 (J9_JIT_NEVER_TRANSLATE)
  8. Some compiled method calls the old foo (step 1) because it has a direct call to the original body of foo (step 1);
  9. The helper cannot determine the new body to forward execution to (because the extra of the J9method of foo is set to -3); so it requests a compilation.
  10. The compilation request sequence has an early return in j9jit_testarossa_err, because compilation happens asynchronously and because linkageInfo->isBeingCompiled() returns true; this flag never gets reset even after the recompilation succeeds in step 2.
  11. Execution returns back to the startPC; return to step 9.
Copy link

github-actions bot commented Dec 4, 2024

Issue Number: 20750
Status: Open
Recommended Components: comp:jit, comp:vm, comp:jitserver

@mpirvu
Copy link
Contributor

mpirvu commented Dec 4, 2024

Once in the interpreter, the extra field of the J9Method of foo gets reset to 0x1 (J9_STARTPC_NOT_TRANSLATED)

I remember that for failed sync compilations due to PREX we do not touch the extra field. I remember it because I thought it was odd to have a method being interpreted while the extra field said it's compiled. Maybe things have changed at some point.

@mpirvu
Copy link
Contributor

mpirvu commented Dec 5, 2024

Looking at the code, on failure we call

#define J9JIT_REVERT_METHOD_TO_INTERPRETED(javaVM, method) \
	(javaVM)->internalVMFunctions->initializeMethodRunAddressNoHook((javaVM), (method))

and initializeMethodRunAddressNoHook() only touches j9method->methodRunAddress and leaves j9method->extra unchanged.
There is another function initializeMethodRunAddress that changes j9method->extra as well.

@dsouzai
Copy link
Contributor Author

dsouzai commented Dec 5, 2024

Looking at the code, on failure we call

#define J9JIT_REVERT_METHOD_TO_INTERPRETED(javaVM, method) \
	(javaVM)->internalVMFunctions->initializeMethodRunAddressNoHook((javaVM), (method))

and initializeMethodRunAddressNoHook() only touches j9method->methodRunAddress and leaves j9method->extra unchanged. There is another function initializeMethodRunAddress that changes j9method->extra as well.

This was very recently changed in #20387; initializeMethodRunAddressNoHook now does modify the j9method->extra, which is what exposed this hang.

Copy link

Issue Number: 20750
Status: Closed
Actual Components: bug, comp:jit
Actual Assignees: No one :(
PR Assignees: dsouzai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants