Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old inbound-agent published as "latest" #3318

Closed
TobiX opened this issue Jan 4, 2023 · 8 comments
Closed

Old inbound-agent published as "latest" #3318

TobiX opened this issue Jan 4, 2023 · 8 comments
Assignees
Labels

Comments

@TobiX
Copy link

TobiX commented Jan 4, 2023

Service(s)

Docker Hub

Summary

It seems that an old version of jenkins/inbound-agent was published to Docker Hub on 2022-12-27. Looking at https://hub.docker.com/r/jenkins/inbound-agent/tags, latest seem to be the same as 4.10-3, which was pushed at the same time.

Reproduction steps

  1. Pull jenkins/inbound-agent:latest from Docker Hub
  2. Get an old agent version...
@TobiX TobiX added the triage Incoming issues that need review label Jan 4, 2023
@lemeurherve lemeurherve added this to the infra-team-sync-2023-01-10 milestone Jan 9, 2023
@dduportal
Copy link
Contributor

dduportal commented Jan 10, 2023

Thanks for reporting @TobiX !

On short term

it seems fixed with the 3085.vc4c6977c075a-3 release. But we still have to find why is the deployment job rebuilding old tags despite everything set up to ignore tags older than 6 days.

On medium term

For the SRE team here: I propose the following plan:

On trusted.ci.jenkins.io (the controller in charge of building + deploying the Docker images to DockerHub),

  • Backup the current Job for the 3 Docker Agent images (it is a GitHub Organization scanning job which filter on the 3 repositories: jenkinsci/agent, jenkinsci/inbound-agent and jenkinsci/docker-ssh-agent
  • Once backuped, let's disable this (gh org) job
  • Then, lets create 3 multibranche jobs, from scratch, with the expected setups (only build tags, not older than 3 days, etc.)
    • Manual management, for today (see long term)
  • Then, let's watch the yesterday's tag being rebuilt and redeployed 1 time with success (and no other builds)

On long term

Following #2845, the 3 release jobs should be migrated on release.ci.
Ideally, we should use JCasc + JobDSL for proper "as code" management

@dduportal dduportal removed the triage Incoming issues that need review label Jan 10, 2023
@dduportal
Copy link
Contributor

We seem to have found the "root cause": it's the "SCM Polling":

  • In this case (Docker images jenkins/agent, jenkins/inbound-agent and jenkins/ssh-agent), the deployment controller trusted.ci.jenkins.io has 1 multibranch pipeline per image, configured to only discover tags, and the builkd policy is to ignore tags older than 3 days.
    • It works well when triggering the Mulitbranch pipeline scanning (manually or once a day).
  • However, the pipeline used to have a triggers { pollScm() } directive (example: https://github.com/jenkinsci/docker-agent/blob/6e4b1e09465e9fac9c0f234f38988353527724e9/Jenkinsfile#L9-L11 for the 4.7-1 tag of the jenkins/agent image's code). These tags have the polling enabled.
    • As soon as the last successfull build, for a given tag, is absent (never succeeded, build history cleaned up, whatever reason), then the polling consider that the tag should be rebuilt at pipeline level, which does NOT have the build policy of "do not build if older than 3 days".

@dduportal
Copy link
Contributor

@dduportal
Copy link
Contributor

We created a new multibranch pipeline to validate the assumption on trusted.ci.jenkins.io:

  • Current Organization Folder named "Agents" (id docker) had been backed up
  • New folder "Agents (new)" (id Agents) in job/Containers (to separate the new and old jobs) with a description pointing to this issue
  • New Multibranch Pipeline named "docker-agent " (id docker-agent) with the same setup as the previous (generated by the Org. scanning) one except that:
    • Behaviours -> "Advanced clone behaviours" -> "Honor refspec on initial clone is disabled" (no need as we only clone tags)
    • Removed "Prune stale remote-tracking branches" and "Prune stale tags" to avoid messing up with the discovered git references
    • Disabled "Discard old items" as we want to keep the build history (restricting to tags only means that we do not have a lot of builds)

=> the initial scan behaved as expected ✅

  • All tags are discovered
  • Only the 3085.vc4c6977c075a-4 tag was triggered and ran successfully, because the tag is aged from yesterday
  • No SCM polling are present

@dduportal
Copy link
Contributor

dduportal commented Jan 17, 2023

Todo:

  • disable the "discard old items / orphans children" on the old jobs setup
  • disable the old MB for docker-agent
  • Create the new MB for docker-inbound-agent
  • disable the old MB for docker-inbound-agent
  • Create the new MB for docker-ssh-agent friday after the 20 of January 11:10 GMT+1 (to ensure that the last tag of this repository, which still had the pollSCM trigger, is not rebuilt)
  • disable the old MB for docker-ssh-agent
  • delete the old setup after 7 days if nothing bad happens

@dduportal
Copy link
Contributor

dduportal commented Jan 18, 2023

While creating the new job for docker-inbound-agent, @smerle33 and I saw the "weird" behavior happening again: despite the Build strategy set up to ignore tags older than 3 days + the "skip initial build when indexing", some (old: 2019/2020) tags were randomly scheduled to build.

We stopped the builds manually and replayed the following pipeline to ensure that git polling was not persisted:

/* NOTE: this Pipeline mainly aims at catching mistakes (wrongly formed Dockerfile, etc.)
 * This Pipeline is *not* used for actual image publishing.
 * This is currently handled through Automated Builds using standard Docker Hub feature
*/
pipeline {
    agent none

    options {
        timeout(time: 2, unit: 'MINUTES')
        buildDiscarder(logRotator(daysToKeepStr: '10'))
        timestamps()
    }

    stages {
        stage('Build Docker Image') {
            steps {
                echo 'OK'
            }
        }
    }
}

(edit) we also cleaned up the builds of the new docker-inbound-agent referencing "polling", as additional safety measure, with:

# From /var/lib/jenkins/jobs/Containers/jobs/Agents/jobs/docker-inbound-agent
$ for f in $(grep -riI polling . | cut -f1 -d: );do echo "==$f"; rm -rf $(dirname "$f");done

@dduportal
Copy link
Contributor

  • New set of jobs created + old one disabled
  • Let's validate that it works by releasing new images for all (with the usual OS updates)

@dduportal
Copy link
Contributor

Last tag of docker-agent was picked up and built by the new jobs automatically (ref. jenkinsci/docker-inbound-agent#326)

It looks like we're good to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants