Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-49707] Allow node blocks from deleted pods to be retried (full version) #1083

Closed
wants to merge 45 commits into from
Closed
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
d1a7c49
Pick up https://github.com/jenkinsci/workflow-durable-task-step-plugi…
jglick Dec 6, 2021
8fa0aab
Some `@CheckForNull`s
jglick Dec 6, 2021
36554ae
`ContainerExecDecorator.ws` no longer unused, but now actively harmfu…
jglick Dec 6, 2021
b34d34a
`Reaper` was activated only by `onOnline`, making it useless for clea…
jglick Dec 6, 2021
178a4d9
`RestartPipelineTest.terminatedPodAfterRestart` improvements: logging…
jglick Dec 6, 2021
748f4ea
Removing comment about `Reaper` rendered incorrect by #714
jglick Dec 6, 2021
9651786
`RestartPipelineTest.terminatedPodAfterRestart` overriding `terminati…
jglick Dec 6, 2021
9395b2f
Implementing `ExecutorStepRetryEligibility`
jglick Dec 6, 2021
6066949
Merge branch 'deps' into retry-JENKINS-49707
jglick Dec 7, 2021
0bbb646
Merge branch 'deps' into retry-JENKINS-49707
jglick Dec 7, 2021
c2e7014
Merge branch 'KubernetesPipelineTest.cascadingDelete' into retry-JENK…
jglick Dec 7, 2021
1433875
`KubernetesPipelineTest.terminatedPod` is analogous to `RestartPipeli…
jglick Dec 7, 2021
5d14a7e
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick Dec 7, 2021
e83d18c
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick Dec 8, 2021
dff6ce4
Pick up https://github.com/jenkinsci/workflow-step-api-plugin/pull/73
jglick Dec 8, 2021
91c0b4b
`KubernetesRetryEligibility` makes more sense in the `pipeline` subpa…
jglick Dec 8, 2021
7884be6
Trying to fix `KubernetesPipelineTest.containerTerminated` by skippin…
jglick Dec 8, 2021
0f00e23
Delaying `Reaper.activate` seems to help? https://github.com/jenkinsc…
jglick Dec 8, 2021
d802e29
Making `KubernetesPipelineTest.podDeadlineExceeded` pass
jglick Dec 8, 2021
d54a047
Typo in `IGNORED_CONTAINER_TERMINATION_REASONS`
jglick Dec 8, 2021
f45d69f
https://github.com/jenkinsci/workflow-step-api-plugin/pull/73 released
jglick Dec 8, 2021
0d391e7
`RestartPipelineTest.terminatedPodAfterRestart` requires https://gith…
jglick Dec 8, 2021
fefb808
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick Dec 9, 2021
494b726
Picking up https://github.com/jenkinsci/workflow-durable-task-step-pl…
jglick Dec 10, 2021
38e6f5d
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick Dec 10, 2021
2c88527
Adapting to https://github.com/jenkinsci/workflow-durable-task-step-p…
jglick Dec 14, 2021
f7a71a2
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick Dec 14, 2021
0a0bb28
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick Jan 10, 2022
2c08b80
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick Apr 28, 2022
2141a03
Initial work with `KubernetesAgentErrorCondition`
jglick May 2, 2022
71510c6
Pick up incremental builds
jglick May 3, 2022
b9174f9
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick May 12, 2022
52c55ff
Comment
jglick May 12, 2022
417f590
Updating deps
jglick May 12, 2022
2f5c297
Expiring `terminationReasons` entries after a day https://github.com/…
jglick May 12, 2022
d918d8b
SpotBugs
jglick May 12, 2022
13853bc
Merge branch 'gitHubRepo' into retry-JENKINS-49707
jglick May 13, 2022
5f53f7f
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick May 18, 2022
5d2e4d8
`errorConditions` → `conditions`
jglick May 19, 2022
e912f1c
Pick up https://github.com/jenkinsci/pipeline-model-definition-plugin…
jglick May 25, 2022
da10da2
Got an incremental deployment of https://github.com/jenkinsci/pipelin…
jglick May 26, 2022
3347882
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick Jun 3, 2022
6172a1b
Merge branch 'retry-JENKINS-49707-base' into retry-JENKINS-49707
jglick Jun 7, 2022
c09a686
Merge branch 'retry-JENKINS-49707-base' into retry-JENKINS-49707
jglick Jun 10, 2022
46b9e7c
Merge branch 'master' of https://github.com/jenkinsci/kubernetes-plug…
jglick Jul 15, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 20 additions & 9 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
<jenkins.host.address />
<slaveAgentPort />
<java.level>8</java.level>
<jenkins.version>2.303.3</jenkins.version>
<jenkins.version>2.332.1</jenkins.version>
<no-test-jar>false</no-test-jar>
<useBeta>true</useBeta>
<gitHubRepo>jenkinsci/${project.artifactId}-plugin</gitHubRepo>
Expand Down Expand Up @@ -105,6 +105,7 @@
<dependency>
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-api</artifactId>
<version>1159.v27cb_4545c3ff</version> <!-- TODO https://github.com/jenkinsci/workflow-api-plugin/pull/217 -->
</dependency>
<dependency> <!-- DeclarativeAgent -->
<groupId>org.jenkinsci.plugins</groupId>
Expand All @@ -120,6 +121,7 @@
<dependency>
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-cps</artifactId>
<version>2691.va_688a_c3d8fd0</version> <!-- TODO https://github.com/jenkinsci/workflow-cps-plugin/pull/534 -->
<optional>true</optional>
</dependency>
<dependency>
Expand All @@ -135,21 +137,23 @@
<groupId>org.jenkins-ci.plugins</groupId>
<artifactId>credentials-binding</artifactId>
</dependency>
<dependency>
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-durable-task-step</artifactId>
<version>1200.v7231de192754</version> <!-- TODO https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/180 -->
</dependency>

<!-- for testing -->
<dependency>
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-job</artifactId>
<version>1181.vcea_0362753c3</version> <!-- TODO https://github.com/jenkinsci/workflow-job-plugin/pull/260 -->
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-basic-steps</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-durable-task-step</artifactId>
<version>960.v0004499239c3</version> <!-- TODO https://github.com/jenkinsci/workflow-basic-steps-plugin/pull/195 -->
<scope>test</scope>
</dependency>
<dependency> <!-- SemaphoreStep -->
Expand All @@ -167,6 +171,7 @@
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-cps</artifactId>
<classifier>tests</classifier>
<version>2691.va_688a_c3d8fd0</version> <!-- TODO https://github.com/jenkinsci/workflow-cps-plugin/pull/534 -->
<scope>test</scope>
</dependency>
<dependency>
Expand Down Expand Up @@ -236,6 +241,12 @@
<groupId>io.jenkins.configuration-as-code</groupId>
<artifactId>test-harness</artifactId>
<scope>test</scope>
<exclusions>
<exclusion> <!-- TODO bom bug? -->
<groupId>org.jenkins-ci.main</groupId>
<artifactId>jenkins-test-harness</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.jenkins-ci.plugins</groupId>
Expand All @@ -256,8 +267,8 @@
<dependencies>
<dependency>
<groupId>io.jenkins.tools.bom</groupId>
<artifactId>bom-2.303.x</artifactId>
<version>1090.v0a_33df40457a_</version>
<artifactId>bom-2.332.x</artifactId>
<version>1370.vfa_e23fe119c3</version>
<scope>import</scope>
<type>pom</type>
</dependency>
Expand All @@ -276,7 +287,7 @@
<dependency><!-- pipeline-model-extensions vs. io.jenkins.configuration-as-code:test-harness -->
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.10.2</version>
<version>2.10.5</version>
</dependency>
<dependency><!-- io.jenkins:configuration-as-code vs. org.jenkins-ci.plugins:junit -->
<groupId>org.apache.commons</groupId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
/*
* Copyright 2021 CloudBees, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.csanchez.jenkins.plugins.kubernetes.pipeline;

import hudson.Extension;
import hudson.ExtensionList;
import hudson.model.Node;
import hudson.model.labels.LabelAtom;
import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
import java.util.logging.Logger;
import jenkins.model.Jenkins;
import org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud;
import org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave;
import org.csanchez.jenkins.plugins.kubernetes.pod.retention.Reaper;
import org.jenkinsci.Symbol;
import org.jenkinsci.plugins.workflow.actions.ErrorAction;
import org.jenkinsci.plugins.workflow.actions.WorkspaceAction;
import org.jenkinsci.plugins.workflow.flow.ErrorCondition;
import org.jenkinsci.plugins.workflow.flow.FlowExecution;
import org.jenkinsci.plugins.workflow.graph.BlockEndNode;
import org.jenkinsci.plugins.workflow.graph.FlowNode;
import org.jenkinsci.plugins.workflow.graphanalysis.LinearBlockHoppingScanner;
import org.jenkinsci.plugins.workflow.steps.FlowInterruptedException;
import org.jenkinsci.plugins.workflow.steps.StepContext;
import org.jenkinsci.plugins.workflow.support.steps.AgentErrorCondition;
import org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution;
import org.kohsuke.stapler.DataBoundConstructor;

/**
* Qualifies {@code node} blocks associated with {@link KubernetesSlave} to be retried if the node was deleted.
* A more specific version of {@link AgentErrorCondition}.
*/
public class KubernetesAgentErrorCondition extends ErrorCondition {

private static final Logger LOGGER = Logger.getLogger(KubernetesAgentErrorCondition.class.getName());

private static final Set<String> IGNORED_CONTAINER_TERMINATION_REASONS = new HashSet<>();
static {
IGNORED_CONTAINER_TERMINATION_REASONS.add("OOMKilled");
IGNORED_CONTAINER_TERMINATION_REASONS.add("Completed");
IGNORED_CONTAINER_TERMINATION_REASONS.add("DeadlineExceeded");
}

@DataBoundConstructor public KubernetesAgentErrorCondition() {}

@Override
public boolean test(Throwable t, StepContext context) throws IOException, InterruptedException {
if (context == null) {
LOGGER.fine("Cannot check error without context");
return false;
}
if (!new AgentErrorCondition().test(t, context)) {
if (t instanceof FlowInterruptedException && ((FlowInterruptedException) t).getCauses().stream().anyMatch(ExecutorStepExecution.QueueTaskCancelled.class::isInstance)) {
LOGGER.fine(() -> "QueueTaskCancelled normally ignored by AgentErrorCondition but might be delivered here from Reaper.TerminateAgentOnContainerTerminated");
// TODO cleaner to somehow suppress that QueueTaskCancelled and let the underlying RemovedNodeCause be delivered
// (or just let AgentErrorCondition trigger on QueueTaskCancelled)
} else {
LOGGER.fine(() -> "Not a recognized failure: " + t);
return false;
}
}
FlowNode _origin = ErrorAction.findOrigin(t, context.get(FlowExecution.class));
if (_origin == null) {
LOGGER.fine(() -> "No recognized origin of error: " + t);
return false;
}
FlowNode origin = _origin instanceof BlockEndNode ? ((BlockEndNode) _origin).getStartNode() : _origin;
LOGGER.fine(() -> "Found origin " + origin + " " + origin.getDisplayFunctionName());
LinearBlockHoppingScanner scanner = new LinearBlockHoppingScanner();
scanner.setup(origin);
for (FlowNode callStack : scanner) {
WorkspaceAction ws = callStack.getPersistentAction(WorkspaceAction.class);
if (ws != null) {
String node = ws.getNode();
Node n = Jenkins.get().getNode(node);
if (n != null) {
if (!(n instanceof KubernetesSlave)) {
LOGGER.fine(() -> node + " was not a K8s agent");
return false;
}
} else {
// May have been removed already, but we can look up the labels to see what it was.
Set<LabelAtom> labels = ws.getLabels();
if (labels.stream().noneMatch(l -> Jenkins.get().clouds.stream().anyMatch(c -> c instanceof KubernetesCloud && ((KubernetesCloud) c).getTemplate(l) != null))) {
LOGGER.fine(() -> node + " was not a K8s agent judging by " + labels);
return false;
}
}
Set<String> terminationReasons = ExtensionList.lookupSingleton(Reaper.class).terminationReasons(node);
if (terminationReasons.stream().anyMatch(r -> IGNORED_CONTAINER_TERMINATION_REASONS.contains(r))) {
LOGGER.fine(() -> "ignored termination reason(s) for " + node + ": " + terminationReasons);
return false;
}
LOGGER.fine(() -> "active on " + node + " (termination reasons: " + terminationReasons + ")");
return true;
}
}
LOGGER.fine(() -> "found no WorkspaceAction starting from " + origin);
return false;
}

@Symbol("kubernetesAgent")
@Extension public static final class DescriptorImpl extends ErrorConditionDescriptor {

@Override public String getDisplayName() {
return "Kubernetes agent errors";
}

}

}
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,13 @@ protected Class<TaskListenerDecorator> type() {

@Override
protected TaskListenerDecorator get(DelegatedContext context) throws IOException, InterruptedException {
KubernetesComputer c = context.get(KubernetesComputer.class);
KubernetesComputer c;
try {
c = context.get(KubernetesComputer.class);
} catch (IOException | InterruptedException x) {
LOGGER.log(Level.FINE, "Unable to look up KubernetesComputer", x);
return null;
}
if (c == null) {
return null;
}
Expand Down
Loading