Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent StepExecutionIterator from leaking memory in cases where a single processed execution has a stuck CPS VM thread #347

Merged
merged 6 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -253,24 +253,26 @@

for (FlowExecution e : FlowExecutionList.get()) {
ListenableFuture<List<StepExecution>> execs = e.getCurrentExecutions(false);
all.add(execs);
Futures.addCallback(execs,new FutureCallback<List<StepExecution>>() {
@Override
public void onSuccess(@NonNull List<StepExecution> result) {
for (StepExecution e : result) {
try {
f.apply(e);
} catch (RuntimeException x) {
LOGGER.log(Level.WARNING, null, x);
}
// We transform the futures that return List<StepExecution> into futures that return Void before
// passing them to Futures.allAsList so that the combined future only holds strong references to each
// FlowExecution until its StepExecutions have been loaded and applied to the function. This prevents
// us from leaking references to all processed executions in the case where a single build has a stuck
// CpsVmExecutorService that prevents the future returned by getCurrentExecutions from completing.
ListenableFuture<Void> results = Futures.transform(execs, (List<StepExecution> result) -> {
for (StepExecution se : result) {
try {
f.apply(se);
} catch (RuntimeException x) {
LOGGER.log(Level.WARNING, null, x);
}
}

@Override
public void onFailure(@NonNull Throwable t) {
LOGGER.log(Level.WARNING, null, t);
}
return null;
}, MoreExecutors.directExecutor());
ListenableFuture<Void> resultsWithWarningsLogged = Futures.catching(results, Throwable.class, t -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. Project Loom would be very welcome in code like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking more carefully through Guava's Javadoc I think I can use FluentFuture to simplify this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually FluentFuture doesn't really make things any clearer, so I will leave it.

LOGGER.log(Level.WARNING, null, t);
return null;

Check warning on line 273 in src/main/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionList.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered lines

Lines 272-273 are not covered by tests
}, MoreExecutors.directExecutor());
all.add(resultsWithWarningsLogged);
}

return Futures.allAsList(all);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,14 @@
import hudson.model.TaskListener;
import hudson.model.queue.QueueTaskFuture;
import java.io.Serializable;
import java.lang.ref.WeakReference;
import java.time.Duration;
import java.time.Instant;
import java.util.Collections;
import java.util.Set;
import java.util.function.Supplier;
import java.util.logging.Level;
import jenkins.model.Jenkins;
import org.hamcrest.Matcher;
import org.jenkinsci.plugins.workflow.cps.CpsFlowDefinition;
import org.jenkinsci.plugins.workflow.job.WorkflowJob;
Expand All @@ -54,13 +56,16 @@
import org.jenkinsci.plugins.workflow.steps.StepExecution;
import org.jenkinsci.plugins.workflow.test.steps.SemaphoreStep;
import org.junit.ClassRule;
import org.junit.Ignore;
import org.junit.Test;
import org.junit.Rule;
import org.jvnet.hudson.test.BuildWatcher;
import org.jvnet.hudson.test.Issue;
import org.jvnet.hudson.test.LoggerRule;
import org.jvnet.hudson.test.JenkinsSessionRule;
import org.jvnet.hudson.test.MemoryAssert;
import org.jvnet.hudson.test.TestExtension;
import org.jvnet.hudson.test.recipes.LocalData;
import org.kohsuke.stapler.DataBoundConstructor;

public class FlowExecutionListTest {
Expand Down Expand Up @@ -160,6 +165,62 @@
});
}

@Ignore("Build never completes due to infinite loop")
@LocalData
@Test public void stepExecutionIteratorDoesNotLeakBuildsWhenOneIsStuck() throws Throwable {
// LocalData created using the following snippet while the build was waiting in the _second_ sleep, except
// for build.xml, which was captured during the sleep step. The StepEndNode for the stage was then adjusted to
// have its startId point to the timeout step's StepStartNode, creating a loop.
/*
sessions.then(r -> {
var stuck = r.createProject(WorkflowJob.class);
stuck.setDefinition(new CpsFlowDefinition("stage('stage') { sleep 30 }; timeout(time: 10) { sleep 30 }", true));
var b = stuck.scheduleBuild2(0).waitForStart();
System.out.println(b.getRootDir());
r.waitForCompletion(b);
});
*/
sessions.then(r -> {
var p = r.jenkins.getItemByFullName("test0", WorkflowJob.class);
var b = p.getBuildByNumber(1);
// TODO: Create and run a new build, call StepExecutionIterator.applyAll while it is running, and assert

Check warning on line 186 in src/test/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionListTest.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: Create and run a new build, call StepExecutionIterator.applyAll while it is running, and assert
// that we do not leak a hard reference to it through the task queue for the CpsVmExecutorService for the
// stuck build and the futures inside of StepExecutionIteratorImpl.applyAll.
// TODO: Ideally, we would detect the issue with the stuck build and find some way to kill it.

Check warning on line 189 in src/test/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionListTest.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: Ideally, we would detect the issue with the stuck build and find some way to kill it.
r.waitForCompletion(b);
});
}

@Ignore("MemoryAssert is unable to clear the SoftReference from AbstractLazyLoadRunMap. Heap dumps show a strong reference from the test execution thread as a local variable, which is not captured by MemoryAssert. Also needs to be moved to a class with no BuildWatcher.")
dwnusbaum marked this conversation as resolved.
Show resolved Hide resolved
@Test public void stepExecutionIteratorDoesNotLeakBuildsWhenOneIsStuck2() throws Throwable {
sessions.then(r -> {
WeakReference<Object> buildRef;
{
var stuck = r.createProject(WorkflowJob.class, "stuck");
stuck.setDefinition(new CpsFlowDefinition("parallel(one: { echo 'test message'; Thread.sleep(Integer.MAX_VALUE); })", false));
var stuckBuild = stuck.scheduleBuild2(0).waitForStart();

var notStuck = r.createProject(WorkflowJob.class, "not-stuck");
notStuck.setDefinition(new CpsFlowDefinition("semaphore('wait')", true));
var notStuckBuild = notStuck.scheduleBuild2(0).waitForStart();
buildRef = new WeakReference<>(notStuckBuild);

SemaphoreStep.waitForStart("wait/1", notStuckBuild);
r.waitForMessage("test message", stuckBuild);
Thread.sleep(1000); // We need Thread.sleep to be running in the CpsVmExecutorService.

// Make FlowExecutionList$StepExecutionIteratorImpl.applyAll submit a task to the CpsVmExecutorService
// for stuck #1 that will never complete, so the resulting future will never complete.
var future = StepExecution.applyAll(e -> null);

SemaphoreStep.success("wait/1", null);
r.waitForCompletion(notStuckBuild);
Jenkins.get().getQueue().clearLeftItems(); // Clean up the soft reference immediately.
}
MemoryAssert.assertGC(buildRef, true);
});
}

public static class NonResumableStep extends Step implements Serializable {
public static final long serialVersionUID = 1L;
@DataBoundConstructor
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<?xml version='1.1' encoding='UTF-8'?>
<flow-build plugin="[email protected]_ec82f">
<queueId>1</queueId>
<timestamp>1722979974600</timestamp>
<startTime>1722979974615</startTime>
<duration>0</duration>
<charset>UTF-8</charset>
<keepLog>false</keepLog>
<execution class="org.jenkinsci.plugins.workflow.cps.CpsFlowExecution">
<result>SUCCESS</result>
<script>stage(&apos;stage&apos;) { sleep 30 }; timeout(time: 10) { sleep 30 }</script>
<loadedScripts class="linked-hash-map"/>
<durabilityHint>MAX_SURVIVABILITY</durabilityHint>
<timings class="map">
<entry>
<string>flowNode</string>
<long>101709541</long>
</entry>
<entry>
<string>classLoad</string>
<long>124446251</long>
</entry>
<entry>
<string>run</string>
<long>200459289</long>
</entry>
<entry>
<string>parse</string>
<long>166818958</long>
</entry>
<entry>
<string>saveProgram</string>
<long>57936125</long>
</entry>
</timings>
<internalCalls class="sorted-set"/>
<sandbox>true</sandbox>
<iota>5</iota>
<head>1:5</head>
dwnusbaum marked this conversation as resolved.
Show resolved Hide resolved
<start>2</start>
<done>false</done>
<resumeBlocked>false</resumeBlocked>
</execution>
<completed>false</completed>
<checkouts class="hudson.util.PersistedList"/>
</flow-build>
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Started
ha:////4LSn6sjy3SQJvLr4M0Jcw5zYiu9jzzxAhCQhf5ciKGcGAAAAoh+LCAAAAAAAAP9tjTEOwjAQBM8BClpKHuFItIiK1krDC0x8GCfWnbEdkooX8TX+gCESFVvtrLSa5wtWKcKBo5UdUu8otU4GP9jS5Mixv3geZcdn2TIl9igbHBs2eJyx4YwwR1SwULBGaj0nRzbDRnX6rmuvydanHMu2V1A5c4MHCFXMWcf8hSnC9jqYxPTz/BXAFEIGsfuclm8zQVqFvQAAAA==[Pipeline] Start of Pipeline
ha:////4FLPcBhXXN+T+Uy5Bq+9NjiuG45smS/CK+MQ8sUKcTBsAAAApR+LCAAAAAAAAP9tjTEOwjAUQ3+KOrAycohUghExsUZZOEFIQkgb/d8mKe3EibgadyBQiQlLlmxL1nu+oE4RjhQdby12HpP2vA+jK4lPFLtroIm3dOGaMFGwXNpJkrGnpUrKFhaxClYC1hZ1oOTRZdiIVt1VExS65pxj2Q4CKm8GeAAThZxVzN8yR9jeRpMIf5y/AJj7DGxXvP/86jduZBmjwAAAAA==[Pipeline] stage
ha:////4MPOLP4Go7JuW3NkAKhksIfyE+NroTrcISNM8xfRFLQ8AAAApR+LCAAAAAAAAP9tjTEOwjAUQ3+KOrAycoh0gA0xsUZZOEFIQkgb/d8mKe3EibgadyBQiQlLlmxL1nu+oE4RjhQdby12HpP2vA+jK4lPFLtroIm3dOGaMFGwXNpJkrGnpUrKFhaxClYC1hZ1oOTRZdiIVt1VExS65pxj2Q4CKm8GeAAThZxVzN8yR9jeRpMIf5y/AJj7DGxXvP/86jfoP95RwAAAAA==[Pipeline] { (stage)
ha:////4E53KhegWm+s/q0TJkIC5MI9kTq62Eqnzz2Qdi1URRTJAAAAoh+LCAAAAAAAAP9tjTEOAiEURD9rLGwtPQTbaGWsbAmNJ0AWEZb8zwLrbuWJvJp3kLiJlZNMMm+a93rDOic4UbLcG+wdZu14DKOti0+U+lugiXu6ck2YKRguzSSpM+cFJRUDS1gDKwEbgzpQdmgLbIVXD9UGhba9lFS/o4DGdQM8gYlqLiqVL8wJdvexy4Q/z18BzLEA29ce4gdpL1fxvAAAAA==[Pipeline] sleep
Sleeping for 30 sec
ha:////4G8hLCAAqKEvMe92YhTNPJa4MSOZpWK2lhgTDgEHbCXUAAAAoh+LCAAAAAAAAP9tjTEOAiEURD9rLGwtPQTbGBtjZUtoPAGyiLDkfxZYdytP5NW8g8RNrJxkknnTvNcb1jnBiZLl3mDvMGvHYxhtXXyi1N8CTdzTlWvCTMFwaSZJnTkvKKkYWMIaWAnYGNSBskNbYCu8eqg2KLTtpaT6HQU0rhvgCUxUc1GpfGFOsLuPXSb8ef4KYI6xADvU7j8OXFZ7vAAAAA==[Pipeline] }
ha:////4AnScT3OQumBbV+luAyxvhEcCl/8MozDCcq/aC6iNLpjAAAAox+LCAAAAAAAAP9tjTEOAiEURD9rLGwtPQRbWFgYK1tC4wmQRYQl/7PAult5Iq/mHSRuYuUkk8yb5r3esM4JTpQs9wZ7h1k7HsNo6+ITpf4WaOKerlwTZgqGSzNJ6sx5QUnFwBLWwErAxqAOlB3aAlvh1UO1QaFtLyXV7yigcd0AT2CimotK5Qtzgt197DLhz/NXAHOMBdihdv8BHeBS2LwAAAA=[Pipeline] // stage
ha:////4PTj6M4JscP6Gdk49EfTAaLMCwLZYd9IOq+brFvOiJPAAAAAph+LCAAAAAAAAP9tjUEKwjAURH8qXbh16SFScCWIK7chG08QkxjThv/bJLVdeSKv5h2MFlw5MDAzMLznC+oU4UjR8dZi5zFpz/swupL4RLG7Bpp4SxeuCRMFy6WdJBl7WqqkbGERq2AlYG1RB0oeXYaNaNVdNUGha845lu0goPJmgAcwUchZxfwtc4TtbTSJ8Mf5C4C5z8B2xfvPr34DrZTeycAAAAA=[Pipeline] timeout
Timeout set to expire in 10 min
ha:////4M9FYx/jFzgoF1Ji4m6uzCtxEvJBQzBzYoKBBTbKepUTAAAApR+LCAAAAAAAAP9tjTEOwjAUQ3+KOrAycoh0BSEm1igLJwhJCGmj/9skpZ04EVfjDgQqMWHJkm3Jes8X1CnCkaLjrcXOY9Ke92F0JfGJYncNNPGWLlwTJgqWSztJMva0VEnZwiJWwUrA2qIOlDy6DBvRqrtqgkLXnHMs20FA5c0AD2CikLOK+VvmCNvbaBLhj/MXAHOfge2K959f/QbB16AVwAAAAA==[Pipeline] {
ha:////4O/MG/IybiYM4oG30m877qNjUwTyRLwWY87qTVAOsZwoAAAAoh+LCAAAAAAAAP9tjTEOwjAQBDdBFLSUPMKBEiEqWisNLzCJMU6su2BfSCpexNf4AxGRqNhqZ5p5vbFMEUeOTjWWWk+p8qoLvZueGji218CDaviiKqbEwarSDiXX9jRjyWIxL8ux0FhZqgInT06w1o15mCIYcsVZ4uQOGrmv73gi01NZTJQvjBGbW18npl/nbwBjJ8j2gny37T6VOYoyvQAAAA==[Pipeline] sleep
Sleeping for 30 sec
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
1232 5
1252
2157 8
2189
2789 10
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?xml version='1.1' encoding='UTF-8'?>
<Tag plugin="[email protected]_ed8a_573">
<node class="cps.n.StepAtomNode" plugin="[email protected]_2c931e7935">
<parentIds>
<string>9</string>
</parentIds>
<id>10</id>
<descriptorId>org.jenkinsci.plugins.workflow.steps.SleepStep</descriptorId>
</node>
<actions>
<cps.a.ArgumentsActionImpl plugin="[email protected]_2c931e7935">
<arguments>
<entry>
<string>time</string>
<long>30</long>
</entry>
</arguments>
<sensitiveVariables/>
<isUnmodifiedBySanitization>true</isUnmodifiedBySanitization>
</cps.a.ArgumentsActionImpl>
<wf.a.TimingAction>
<startTime>1722980005296</startTime>
</wf.a.TimingAction>
<s.a.LogStorageAction/>
</actions>
</Tag>
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<?xml version='1.1' encoding='UTF-8'?>
<Tag plugin="[email protected]_ed8a_573">
<node class="org.jenkinsci.plugins.workflow.graph.FlowStartNode">
<parentIds/>
<id>2</id>
</node>
<actions>
<wf.a.TimingAction>
<startTime>1722979974868</startTime>
</wf.a.TimingAction>
</actions>
</Tag>
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?xml version='1.1' encoding='UTF-8'?>
<Tag plugin="[email protected]_ed8a_573">
<node class="cps.n.StepStartNode" plugin="[email protected]_2c931e7935">
<parentIds>
<string>2</string>
</parentIds>
<id>3</id>
<descriptorId>org.jenkinsci.plugins.workflow.support.steps.StageStep</descriptorId>
</node>
<actions>
<s.a.LogStorageAction/>
<cps.a.ArgumentsActionImpl plugin="[email protected]_2c931e7935">
<arguments>
<entry>
<string>name</string>
<string>stage</string>
</entry>
</arguments>
<sensitiveVariables/>
<isUnmodifiedBySanitization>true</isUnmodifiedBySanitization>
</cps.a.ArgumentsActionImpl>
<wf.a.TimingAction>
<startTime>1722979974972</startTime>
</wf.a.TimingAction>
</actions>
</Tag>
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<?xml version='1.1' encoding='UTF-8'?>
<Tag plugin="[email protected]_ed8a_573">
<node class="cps.n.StepStartNode" plugin="[email protected]_2c931e7935">
<parentIds>
<string>3</string>
</parentIds>
<id>4</id>
<descriptorId>org.jenkinsci.plugins.workflow.support.steps.StageStep</descriptorId>
</node>
<actions>
<wf.a.BodyInvocationAction/>
<wf.a.LabelAction>
<displayName>stage</displayName>
</wf.a.LabelAction>
<wf.a.TimingAction>
<startTime>1722979974988</startTime>
</wf.a.TimingAction>
</actions>
</Tag>
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?xml version='1.1' encoding='UTF-8'?>
<Tag plugin="[email protected]_ed8a_573">
<node class="cps.n.StepAtomNode" plugin="[email protected]_2c931e7935">
<parentIds>
<string>4</string>
</parentIds>
<id>5</id>
<descriptorId>org.jenkinsci.plugins.workflow.steps.SleepStep</descriptorId>
</node>
<actions>
<cps.a.ArgumentsActionImpl plugin="[email protected]_2c931e7935">
<arguments>
<entry>
<string>time</string>
<long>30</long>
</entry>
</arguments>
<sensitiveVariables/>
<isUnmodifiedBySanitization>true</isUnmodifiedBySanitization>
</cps.a.ArgumentsActionImpl>
<wf.a.TimingAction>
<startTime>1722979975077</startTime>
</wf.a.TimingAction>
<s.a.LogStorageAction/>
</actions>
</Tag>
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<?xml version='1.1' encoding='UTF-8'?>
<Tag plugin="[email protected]_ed8a_573">
<node class="cps.n.StepEndNode" plugin="[email protected]_2c931e7935">
<parentIds>
<string>5</string>
</parentIds>
<id>6</id>
<startId>4</startId>
</node>
<actions>
<wf.a.BodyInvocationAction/>
<wf.a.TimingAction>
<startTime>1722980005117</startTime>
</wf.a.TimingAction>
</actions>
</Tag>
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<?xml version='1.1' encoding='UTF-8'?>
<Tag plugin="[email protected]_ed8a_573">
<node class="cps.n.StepEndNode" plugin="[email protected]_2c931e7935">
<parentIds>
<string>6</string>
</parentIds>
<id>7</id>
<startId>8</startId> <!-- Manually edited to simulate corruption observed in the real world. -->
dwnusbaum marked this conversation as resolved.
Show resolved Hide resolved
</node>
<actions>
<wf.a.TimingAction>
<startTime>1722980005177</startTime>
</wf.a.TimingAction>
</actions>
</Tag>
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?xml version='1.1' encoding='UTF-8'?>
<Tag plugin="[email protected]_ed8a_573">
<node class="cps.n.StepStartNode" plugin="[email protected]_2c931e7935">
<parentIds>
<string>7</string>
</parentIds>
<id>8</id>
<descriptorId>org.jenkinsci.plugins.workflow.steps.TimeoutStep</descriptorId>
</node>
<actions>
<s.a.LogStorageAction/>
<cps.a.ArgumentsActionImpl plugin="[email protected]_2c931e7935">
<arguments>
<entry>
<string>time</string>
<int>10</int>
</entry>
</arguments>
<sensitiveVariables/>
<isUnmodifiedBySanitization>true</isUnmodifiedBySanitization>
</cps.a.ArgumentsActionImpl>
<wf.a.TimingAction>
<startTime>1722980005227</startTime>
</wf.a.TimingAction>
</actions>
</Tag>
Loading
Loading