Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move archives.jenkins.io service away from Oracle #3760

Closed
dduportal opened this issue Sep 23, 2023 · 14 comments
Closed

Move archives.jenkins.io service away from Oracle #3760

dduportal opened this issue Sep 23, 2023 · 14 comments

Comments

@dduportal
Copy link
Contributor

dduportal commented Sep 23, 2023

Service(s)

Archives

Summary

We want to stop using Oracle Cloud as they are no more sponsoring the Jenkins infrastructure.

The main consequence is the VM archives.jenkins.io which should move away from Oracle Cloud.

There are 3 immediate destinations:

  • Azure:
    • 👎 adds costs for the VM or the pod
    • 👍 data is already present on the mirrorbits-binary bucket
    • 👎 adds outbound bandwidth costs
  • DigitalOcean
    • 👎 adds costs for the VM or the pod
    • 👎 need to sync the 500 Gb+ data
    • 👍 outbound bandwidth is free until 500 Gb threshold is reached

- OSUOSL (not enough disk)

We also can think about using Cloudlfare is the near future (no bandwidth cost and cheap storage pricing)

Reproduction steps

No response

@dduportal dduportal added the triage Incoming issues that need review label Sep 23, 2023
@dduportal dduportal added this to the infra-team-sync-2023-09-26 milestone Sep 23, 2023
@dduportal dduportal self-assigned this Sep 25, 2023
@dduportal dduportal removed the triage Incoming issues that need review label Sep 25, 2023
@dduportal
Copy link
Contributor Author

It's Digital Ocean time!

@dduportal
Copy link
Contributor Author

dduportal commented Sep 25, 2023

  • Create a fresh new VM in DO for archives.jenkins.io (estimated cost: $88 / month on DigitalOcean)

    • 2 vCPUs (not only 1) because we will increase usage by adding it as fallback for mirrorbits (despite the current Oracle VM at 2 vCPUs with 5-7 % usage on past 3 months)
    • 4 Gb (no instance at 2 vCPUs on DigitalOcean has less + no need for more than 2 Gb as only 700 Mo are used in current Oracle VM)
    • 700 Gb disk (Current usage is 505 Gb on a 1 Tb volume on oracle)
    • Need a public IP
    • Proposed name: do.archives.jenkins.io for the VM + add a DNS A record to this VM's Public IP
    • Add admin SSH key to sops for Jenkins infra team
    • Add firewall rules: HTTP/HTTPS from everywhere but incoming SSH only from admin IPs and/or trusted.ci (reason why we can't use doks-public)
  • Add the VM under Puppet management

    • Create new node in the node list and apply disk formatting
    • Setup roles but copy Letsencrypt certificates (HTTP-01 won't work otherwise)
    • Apply to VM once created
  • Initial data migration with rsync from current Oracle VM to DigitalOcean

    • Run rsync a 2nd time an measure time
  • Next steps:

    • Check the current update system for archive.jenkins.io (update_center2) as there might be .ssh/known_hosts to update after migration
    • announce operation
    • Migrate DNS (CNAME)
    • Run rsync one last time
    • Then run update_center2 and check it works with rsync
    • Plan for Oracle VM decomissioning after all of that

@dduportal
Copy link
Contributor Author

Update:

  • VM created after an hotfix: DigitalOcean requires volume names to be lowercase alphanum 😅
  • For the record, the default user of DO droplets is root (associated with the provided SSH key)
  • Currently upgrading packages + reboot through SSH as root
  • I don't see any request in the puppet master: gotta check the cloud init process

@dduportal
Copy link
Contributor Author

Update:

$ dig AAAA @ns2.digitalocean.com archives.do.jenkins.io   
# ...
;; ANSWER SECTION:
archives.do.jenkins.io. 1800    IN      AAAA    2a03:b0c0:3:d0::9bc:d001
# ...
$ dig A @ns3.digitalocean.com archives.do.jenkins.io   
# ...
;; ANSWER SECTION:
archives.do.jenkins.io. 1800    IN      A       46.101.121.132
# ...
$ dig A @ns1.digitalocean.com repo.do.jenkins.io    
# ...
;; ANSWER SECTION:
repo.do.jenkins.io.     1800    IN      A       157.245.23.55
# ...

@dduportal
Copy link
Contributor Author

Update: data migration from Oracle is ready to roll:

Next step: Puppet before the real migration

@dduportal
Copy link
Contributor Author

Update:

  • The mounted volume required a 2nd step after the VM reboot: resise2fs /dev/sda with the volume unmounted
    • Checked by mounting it to /srv and checking the new size with df -h /srv.
    • Had to run a e2fsck -f /dev/sda before
    • No data loss \o/ (checked the amount of used blocks to be really paranoid)
  • Started the rsync from old to new VM has I had 2-3 hours before playing with puppet
  • Copied the /etc/letsencrypt from current VM to ease the bootstrap process. ⚠️ Gotta update it once online to avoid the same problem as SSL certificate for ci.jenkins.io expires in 23 days #3740 (comment) in 3 months...
  • feat: initial support of the archives.do.jenkins.io VM jenkins-infra#3084 => Adding under puppet management
  • Started the puppet initial provisionning
    • Stopped the puppet agent with systemctl stop puppet
    • Signed the certificater on the puppet master
    • Started the puppet agent
    • Checking logs with journalctl

@dduportal
Copy link
Contributor Author

Update:

  • Puppet agent worked properly

  • Rebooted the VM:

    • Volume is mounted properly
    • Puppet agent ran at startup with success
    • Re-ran the agent a 3rd time and no change + no error
  • Continuing the rsync migration. Status:

    $ echo && df -h /srv
    Tue Sep 26 12:56:45 PM UTC 2023
    
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sda        744G  179G  528G  26% /srv
  • Webservice is up and running with HTTPS (but HTTPS is only for archives.jenkins.io and archives.jenkins-ci.org for now):

$ curl --verbose --header 'Host: archives.jenkins.io' --insecure https://archives.do.jenkins.io/TIME
*   Trying 46.101.121.132:443...
* Connected to archives.do.jenkins.io (46.101.121.132) port 443 (#0)
* ALPN: offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=archives.jenkins-ci.org
*  start date: Aug  3 05:06:52 2023 GMT
*  expire date: Nov  1 05:06:51 2023 GMT
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* using HTTP/2
* h2 [:method: GET]
* h2 [:scheme: https]
* h2 [:authority: archives.jenkins.io]
* h2 [:path: /TIME]
* h2 [user-agent: curl/8.1.2]
* h2 [accept: */*]
* Using Stream ID: 1 (easy handle 0x14380bc00)
> GET /TIME HTTP/2
> Host: archives.jenkins.io
> User-Agent: curl/8.1.2
> Accept: */*
> 
< HTTP/2 200 
< last-modified: Tue, 26 Sep 2023 12:29:56 GMT
< etag: "b-606423b821678"
< accept-ranges: bytes
< content-length: 11
< date: Tue, 26 Sep 2023 12:57:36 GMT
< server: Apache
< 
1695731396
* Connection #0 to host archives.do.jenkins.io left intact
* ```

@dduportal
Copy link
Contributor Author

Update:

  • rsync finished.
  • Ran a 2nd rsync
  • Migrated the (non IaC managed) records archives.jenkins.io and archives.jenkins-ci.org with a 1 min TTL.
    • TODO: remove the former phoenix.archives record in jenkins.io
    • TODO: manage current CNAME records in terraform
  • Stopped the apache2 server on Oracle VM
  • WiP: I forgot to allow pkg and trusted IPs for rsync.

@dduportal
Copy link
Contributor Author

Update:

@dduportal
Copy link
Contributor Author

Update: the old Oracle VM has been shut down but it is still here. We'll remove it friday (time to check if we miss something).

dduportal added a commit to jenkins-infra/azure that referenced this issue Sep 28, 2023
…jenkins.io and archives.jenkins-ci.org (#488)

Related to jenkins-infra/helpdesk#3760

This PR defines the existing `CNAME` DNS records for
`archives.jenkins.io` and `archives.jenkins-ci.org`.

These records have been imported in the terraform state. Merging this PR
will apply the following changes (only add tags to the records):

```
Terraform will perform the following actions:

  # azurerm_dns_cname_record.archives_jenkins_io["jenkins-ci.org"] will be updated in-place
  ~ resource "azurerm_dns_cname_record" "archives_jenkins_io" {
        id                  = "/subscriptions/<redacted>/resourceGroups/proddns_jenkinsci/providers/Microsoft.Network/dnsZones/jenkins-ci.org/CNAME/archives"
        name                = "archives"
      ~ tags                = {
          + "repository" = "jenkins-infra/azure"
          + "scope"      = "terraform-managed"
        }
        # (5 unchanged attributes hidden)
    }

  # azurerm_dns_cname_record.archives_jenkins_io["jenkins.io"] will be updated in-place
  ~ resource "azurerm_dns_cname_record" "archives_jenkins_io" {
        id                  = "/subscriptions/<redacted>/resourceGroups/proddns_jenkinsio/providers/Microsoft.Network/dnsZones/jenkins.io/CNAME/archives"
        name                = "archives"
      ~ tags                = {
          + "repository" = "jenkins-infra/azure"
          + "scope"      = "terraform-managed"
        }
        # (5 unchanged attributes hidden)
    }

Plan: 0 to add, 2 to change, 0 to destroy.
```

Signed-off-by: Damien Duportal <[email protected]>
@dduportal
Copy link
Contributor Author

Update:

@dduportal
Copy link
Contributor Author

Update:

  • Removed the VM, storage, tenant and network from Oracle cloud with @MarkEWaite
  • Removed last remnants of the former public IP.

We can close this issue 🥳

@dduportal
Copy link
Contributor Author

Reopening as 2 missing documents/cleanups:

  • Leftovers in jenkins-infra/jenkins-infra
  • Runbook to update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant