Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new retention policy "onJobFailure" #1265

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
package org.csanchez.jenkins.plugins.kubernetes.pod.retention;

import hudson.Extension;
import hudson.model.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set your IDE to avoid wildcard imports.

import io.fabric8.kubernetes.api.model.Pod;
import jenkins.model.Jenkins;

import org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud;
import org.jenkinsci.Symbol;
import org.kohsuke.stapler.DataBoundConstructor;


import java.io.Serializable;
import java.time.Duration;
import java.util.List;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import java.util.stream.Collectors;

/**
* This pod retention policy keeps the pod from being terminated if the Jenkins
* job it's associated with fails.
*
* In case of any other result, including errors in determining the result, it
* will default to deleting the pod.
*/
public class OnJobFailure extends PodRetention implements Serializable {

private static final long serialVersionUID = -6422177946264212816L;

private static final Logger LOGGER = Logger.getLogger(OnJobFailure.class.getName());

private static final String MODULENAME = "OnJobFailure";

// small convenience function
private void LOG(Level level, String message) {
LOGGER.log(level, () -> MODULENAME + ": " + message);
}
Comment on lines +38 to +41
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unneeded, the class name is already printed as part of the standard formatter.


@DataBoundConstructor
public OnJobFailure() {
}

@Override
public boolean shouldDeletePod(KubernetesCloud cloud, Pod pod) {
if (cloud == null || pod == null) {
LOG(Level.INFO, "shouldDeletePod called without actual cloud and pod");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Demote all this logging to FINE.

return true;
}

// Get the current Jenkins instance to access a list of all jobs
Jenkins jenkins = Jenkins.getInstanceOrNull();
if (jenkins == null) {
LOG(Level.INFO, "Couldn't get the current Jenkins reference");
return true;
}
Comment on lines +55 to +59
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Jenkins jenkins = Jenkins.getInstanceOrNull();
if (jenkins == null) {
LOG(Level.INFO, "Couldn't get the current Jenkins reference");
return true;
}
Jenkins jenkins = Jenkins.get();


// All known jobs of the current Jenkins instance
List<Job> jobs = jenkins.getAllItems(Job.class);
if (jobs.isEmpty()) {
LOG(Level.INFO, "Jenkins doesn't have any jobs?");
return true;
}

// runUrl will be something like "job/<name>/<runId>/" or
// "job/<folder>/job/<name>/<runId>/" if nested
// this is the trick how we get our job name and run id
String runUrl = pod.getMetadata().getAnnotations().get("runUrl");
if (runUrl == null) {
LOG(Level.INFO, "The pod has no required 'runUrl' annotation");
return true;
}

// everything is in place, get the result
Result result = getResultForJob(runUrl, jobs);
if (result == null) {
// we couldn't get the result for some reason
LOG(Level.INFO, "Couldn't find the result for runUrl: " + runUrl);
return true;
}

// finally, delete only if successful
boolean delete = result.equals(Result.SUCCESS);
LOG(Level.FINE, "delete = " + delete);
return delete;
}

/**
* Split up the runUrl string and return the run id
*
* @param runUrl the "runUrl" annotation of the kubernetes pod
* @return the run id as a string
*/
public String getRunId(String runUrl) {
// extract the relevant parts
String[] parts = runUrl.split("/");

if (parts.length < 3) {
LOG(Level.INFO, "runUrl has unknown format: " + runUrl);
return null;
}
Comment on lines +101 to +104
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way too simplistic. If you need to look up the run, I would advise instead to add new annotations (one for the item full name, another for the build id)


return parts[parts.length - 1].trim();
}

/**
* Filter the entire job list down to the one job that we're looking for
*
* @param runUrl the "runUrl" annotation of the kubernetes pod
* @param jobs the list of all Jenkins jobs
* @return the matching job, if successful, or null on error
*/
public Job getJob(String runUrl, List<Job> jobs) {
// strip the runId to enable matching by jobUrl
Pattern pattern = Pattern.compile("(^job.+/)[0-9]+/?$");
Matcher matcher = pattern.matcher(runUrl);
String jobUrl = matcher.group(1);

// find the jobs that match the shortened runUrl annotation
// it should be only one
List<Job> matchingJobs = jobs.stream().filter(t -> jobUrl.equals(t.getUrl())).collect(Collectors.toList());

// we expect to find exactly one job
if (matchingJobs.size() != 1) {
LOG(Level.INFO, "For some reason we found multiple matching jobs: " + matchingJobs.size());
return null;
}

return matchingJobs.get(0);
}

/**
* Get the result for a particular Jenkins job
*
* @param runUrl the "runUrl" annotation of the kubernetes pod
* @param jobs the list of all Jenkins jobs
* @return the job results, if successful, or null on error
*/
public Result getResultForJob(String runUrl, List<Job> jobs) {
// get the id of this particular run
String runId = getRunId(runUrl);
if (runId == null) {
LOG(Level.INFO, "Couldn't get the runId");
return null;
}

// get a reference to the job that started the pod
Job job = getJob(runUrl, jobs);
if (job == null) {
LOG(Level.INFO, "Can't find the job for runUrl: " + runUrl);
return null;
}

// use job and runId to find the particular run
Run run = job.getBuild(runId);
if (run == null) {
LOG(Level.INFO, "Couldn't find the run for runUrl: " + runUrl);
return null;
}

// get the result
Result result = run.getResult();

// and then this sometimes happens: the run has finished and
// Jenkins asks if the pod should be deleted, but the result
// is actually still null. We just repeat querying for 30
// seconds and then abort if it's still not available
int maxRounds = 30; // arbitrary

while (result == null && maxRounds > 0) {
LOG(Level.FINE, "result == null, waiting...");

maxRounds--;

try {
Thread.sleep(Duration.ofSeconds(1).toMillis());
} catch (Exception e) {
LOG(Level.INFO, "Thread.sleep failed: " + e.getMessage());
}

// retry getting the result
result = run.getResult();
}

// done
return result;
}

@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (obj instanceof OnJobFailure) {
return true;
}
return false;
}

@Override
public int hashCode() {
return this.toString().hashCode();
}

@Override
public String toString() {
return Messages.on_Job_Failure();
}

@Extension
@Symbol("onJobFailure")
public static class DescriptorImpl extends PodRetentionDescriptor {
@Override
public String getDisplayName() {
return Messages.on_Job_Failure();
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
<ol>
<li>Never - always delete the agent pod.</li>
<li>On Failure - keep the agent pod if it fails during the build.</li>
<li>On Job Failure - keep the agent pod if the build itself fails.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should rather be called On Build Failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in Jenkins terminology fails means FAILURE—the build did not run to normal completion. UNSTABLE (ran to completion but with test failures) would be considered “successful”. If you mean to use the current impl (checking for SUCCESS) then be clear that an unstable build will also retain the pod.

<li>Always - always keep the agent pod.</li>
</ol>
<p>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ always=Always
_default=Default
never=Never
on_Failure=On Failure
on_Job_Failure=On Job Failure
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,23 @@ public void testOnFailurePodRetention() {
assertTrue(subject.shouldDeletePod(cloud, pod));
}

@Test
public void testOnJobFailurePodRetention() {
OnJobFailure subject = new OnJobFailure();

// regular
String runId = subject.getRunId("job/jobname/42/");
assertEquals("42", runId);

// nested
runId = subject.getRunId("job/jobname1/job/jobname2/42/");
assertEquals("42", runId);

// folder name has numbers
runId = subject.getRunId("job/22/42/");
assertEquals("42", runId);
}

private PodStatus buildStatus(String phase) {
return new PodStatusBuilder().withPhase(phase).build();
}
Expand Down