Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRIU tests get accurate pre-checkpoint time #18425

Merged
merged 1 commit into from
Nov 15, 2023

Conversation

JasonFengJ9
Copy link
Member

CRIU tests get accurate pre-checkpoint time

Use CRIUTestUtils.prepareCheckPointJVM() to avoid potential long disk file-related API calls before taking the pre-checkpoint time.

100x grinder - no failure reproduced, a few infra errors.

closes #18384

Signed-off-by: Jason Feng [email protected]

@JasonFengJ9 JasonFengJ9 added comp:test criu Used to track CRIU snapshot related work labels Nov 9, 2023
@JasonFengJ9 JasonFengJ9 requested a review from tajila November 9, 2023 00:19
@tajila
Copy link
Contributor

tajila commented Nov 9, 2023

jenkins test sanity xlinux jdk21

@JasonFengJ9
Copy link
Member Author

Timeout has been exceeded
org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: 9c16731a-774a-4891-972b-01f0fe65078b
Timeout has been exceeded

This is an infra issue.

@tajila
Copy link
Contributor

tajila commented Nov 13, 2023

jenkins test sanity zlinux jdk21

@JasonFengJ9
Copy link
Member Author

 [OUT] 01:24:42.760*0x69800           j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1699925082758) vmStartTime (1699925080646) criuTimeDeltaMillis (0)
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Mon Nov 13 20:24:42 EST 2023, System.currentTimeMillis(): 1699925083378, System.nanoTime(): 1699925083375922784
 [OUT] 01:24:44.241 0x69800          j9criu.9        > Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] 01:24:44.246 0x69800          j9criu.18       - Taking a checkpoint with active clinit
 [OUT] 01:24:44.329 0x69800          j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1699925084326791605, checkpointNanoUTCTime = 1699925084330680443
 [OUT] 01:24:44.701 0x69800          j9criu.12       - Current syslogOptions: error,vital
 [OUT] 01:24:44.702 0x69800          j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1699925084699118946, j9time_current_time_nanos() returns 1699925084703007943

The time between first RuntimeMXBeanImpl_getUptimeImpl and Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl entry is ~1.5s, ~2s before checkpoint criu_dump() in rh7-390-1.
Investigating where the time was consumed after moving out the local directory creation API calls.

@JasonFengJ9
Copy link
Member Author

Unable to reproduce the failure in a grinder, the test is modified to check if the PR build picks up the change.
@tajila Could you launch another PR test?

@tajila
Copy link
Contributor

tajila commented Nov 14, 2023

jenkins test sanity zlinux jdk21

@JasonFengJ9
Copy link
Member Author

In a large grinder (500x), a failure occurred with the time between first RuntimeMXBeanImpl_getUptimeImpl and Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl was ~3s which is much larger than the adjustMillis = 1500ms

23:36:07   [OUT] 04:36:00.184*0x69800          j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1700022960179) vmStartTime (1700022950090) criuTimeDeltaMillis (0)
23:36:07   [OUT] 04:36:03.278 0x69800          j9criu.9        > Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
23:36:07   [OUT] 04:36:03.482 0x69800          j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1700022963478393943, j9time_current_time_nanos() returns 1700022963482488807

Removed the comparison for now until find a reliable way to measure the time between two RuntimeMXBean.getUptime() across C/R.

Passed PR builds:
https://openj9-jenkins.osuosl.org/job/Build_JDK21_s390x_linux_Personal/118/
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_s390x_linux_Personal/62/

@tajila
Copy link
Contributor

tajila commented Nov 15, 2023

jenkins test sanity zlinux jdk21

Use CRIUTestUtils.prepareCheckPointJVM() to avoid potential long disk
file related API calls before taking the pre-checkpoint time.

Signed-off-by: Jason Feng <[email protected]>
@JasonFengJ9
Copy link
Member Author

Just resolved the merging conflict.
@tajila could you re-launch the PR build?

@tajila
Copy link
Contributor

tajila commented Nov 15, 2023

jenkins test sanity zlinux jdk21

@tajila tajila merged commit 744bf55 into eclipse-openj9:master Nov 15, 2023
@JasonFengJ9 JasonFengJ9 deleted the mxbeanuptime branch November 15, 2023 20:30
@pshipton
Copy link
Member

Failure in 0.43 build, https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_s390x_linux_Release/82/
Should this fix be delivered to 0.42, 0.43?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:test criu Used to track CRIU snapshot related work
Projects
None yet
3 participants