From 1fa971c3a13bc0ea4b0e9c60d81648e8101d3aa1 Mon Sep 17 00:00:00 2001 From: Jakub Kadlcik Date: Wed, 27 Nov 2024 22:12:49 +0100 Subject: [PATCH 1/6] doc: fixes during the F39 to F41 infrastructure update pt.1 See #3460 --- doc/how_to_upgrade_persistent_instances.rst | 28 ++++++++++++++------- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/doc/how_to_upgrade_persistent_instances.rst b/doc/how_to_upgrade_persistent_instances.rst index 951411c11..3d704fa40 100644 --- a/doc/how_to_upgrade_persistent_instances.rst +++ b/doc/how_to_upgrade_persistent_instances.rst @@ -16,6 +16,8 @@ Requirements * Since we do not modify the public IPs (neither v4 nor v6), no DNS modifications should be required. However, familiarize yourself with the `DNS SOP`_ in case of any issues. +* Make sure you have `/usr/bin/aws` installed and that you have `fedora-copr` + section in `~/.aws/credentials` Pre-upgrade =========== @@ -40,8 +42,8 @@ Ensure you have the `helper playbook repository`_ cloned locally and navigate to the clone directory. Review the ``dev.yml``, ``prod.yml``, and ``all.yml`` configurations in the -``./group_vars`` directory. Pay particular attention to the ``old_instance_id``, -``old_network_id``, and data volume IDs as **these MUST match the EC2 reality**. +``./group_vars`` directory. Pay particular attention to the data volume IDs as +**these MUST match the EC2 reality**. In the following moments, you will run several playbooks on your machine. During execution, explicitly specify two Ansible variables, ``copr_instance`` @@ -54,10 +56,10 @@ During execution, explicitly specify two Ansible variables, ``copr_instance`` Identify the AMI (golden images) you want to use for the new VM instances. Typically, upgrade to ``Fedora N+2`` (e.g., migrating infrastructure from Fedora 37 to Fedora 39). Visit the `Cloud Base Images`_ download page, locate the -**Intel and AMD x86_64 systems** section, and click the button next to -**Fedora Cloud 39 AWS** (ensure JavaScript is enabled for this page!). -Note the ``ami-*`` ID in the **US East (N. Virginia)** region (for example -``ami-0746fc234df9c1ee0``). Specify this ``ami-*`` ID in +**Launch on public cloud platforms** section for **x86_64-based instances**, and +click the button next to **Fedora Cloud 41 AWS** (ensure JavaScript is enabled +for this page!). Note the ``ami-*`` ID in the **US East (N. Virginia)** region +(for example ``ami-0746fc234df9c1ee0``). Specify this ``ami-*`` ID in ``group_vars/all.yml``, and ensure both ``group_vars/{dev,prod}.yml`` correctly reference it. @@ -65,8 +67,14 @@ Double-check other machine parameters such as instance types, names, tags, IP addresses, root volume sizes, etc. Usually, the pre-filled defaults suffice, but verification is recommended. -Use the `ec2instances.info`_ comparator to find the cheapest available instance -type that meets our needs whenever more power is required. +.. note:: + Use the `ec2instances.info`_ comparator to find the cheapest available + instance type that meets our needs whenever more power is required. + +.. note:: + + Don't worry about ``old_instance_id`` and ``new_instance_id`` for now. We + will change them after running the first set of playbooks .. warning:: @@ -97,6 +105,7 @@ Launch new instances As simple as:: + $ opts=( -e copr_instance=dev -e server_id=keygen ) $ ansible-playbook play-vm-migration-01-new-box.yml "${opts[@]}" You'll see an output like:: @@ -112,7 +121,8 @@ You'll see an output like:: } Now fix the corresponding ``new_instance_id`` and ``new_network_id`` options in -``group_vars/{dev,prod}.yml`` according to the output. +``group_vars/{dev,prod}.yml`` according to the output. Also update +``old_instance_id`` and ``old_network_id`` options. Note the Private IP addresses ----------------------------- From 923ae4d5e831ae76282b0d061e322b1c64ed1b9d Mon Sep 17 00:00:00 2001 From: Pavel Raiskup Date: Thu, 28 Nov 2024 07:51:35 +0100 Subject: [PATCH 2/6] doc: fixes during the F39 to F41 infrastructure update pt.2 --- doc/how_to_upgrade_persistent_instances.rst | 35 +++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/doc/how_to_upgrade_persistent_instances.rst b/doc/how_to_upgrade_persistent_instances.rst index 3d704fa40..9f7b2cd21 100644 --- a/doc/how_to_upgrade_persistent_instances.rst +++ b/doc/how_to_upgrade_persistent_instances.rst @@ -35,6 +35,33 @@ Announce the outage See a specific document :ref:`announcing_fedora_copr_outage`, namely the "planned" outage state. +Check the hot-fixes +------------------- + +The old set of instances (especially prod) has been running for quite some time, +likely accumulating several hotfixes over that period. Research the applied +hotfixes and determine which of them need to be manually implemented on the N+2 +boxes (if any, note them). + +First, check the `hot-fixed issues and PRs `_. +Then, check the file-system modifications:: + + # over ssh on the _old_ box, search for weird things (ignore config changes + # and /boot) + [root@copr-be-dev ~][STG]# rpm -Va | grep -v -e /etc/ -e /boot/ + ... + S.5....T. /var/www/cgi-resalloc + ... + S.5....T. /usr/lib/python3.12/site-packages/copr_backend/pulp.py + ... + +E.g., the ``/var/www/cgi-resalloc`` file is a weird change, but that in +particular is covered `in playbooks `_. +The ``pulp.py`` change is important to note though! You may consult the +``dnf diff copr-backend`` output, find the corresponding upstream PR on GitHub, +and tag the PR with ``hot-fixed`` label (if not already done). + + Preparation ----------- @@ -251,6 +278,10 @@ It's possible that the playbook fails, but it typically isn't crucial now. If provisioning at least reaches the end of the ``base`` role, revert the ``birthday=yes`` commit and proceed with the next steps. +The playbooks above have not automatically updated the systems. If you prefer +to start on Fedora N+2 with up-2-date set of packages, do the ``dnf update`` now +(manual step over ssh). + Get it working -------------- @@ -273,6 +304,10 @@ Post-upgrade By this point, every Copr service should be operational. +It's a good idea to test ``/usr/sbin/reboot`` now to debug potential boot issues +during the outage window, as future reboots are likely to occur at the most +inconvenient times. + Rename the instance names ------------------------- From c4d0bd3bc85f76d9adda2216645dc0eae36aa280 Mon Sep 17 00:00:00 2001 From: Jakub Kadlcik Date: Sun, 1 Dec 2024 00:47:26 +0100 Subject: [PATCH 3/6] beaker-tests-sanity: skip OpenMandriva tests --- .../Sanity/copr-cli-basic-operations/runtest-openmandriva.sh | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/beaker-tests/Sanity/copr-cli-basic-operations/runtest-openmandriva.sh b/beaker-tests/Sanity/copr-cli-basic-operations/runtest-openmandriva.sh index 07da5df05..e8c600400 100755 --- a/beaker-tests/Sanity/copr-cli-basic-operations/runtest-openmandriva.sh +++ b/beaker-tests/Sanity/copr-cli-basic-operations/runtest-openmandriva.sh @@ -19,6 +19,10 @@ CHROOTS+=" --chroot openmandriva-rolling-x86_64" rlJournalStart rlPhaseStartSetup setup_checks + # https://github.com/fedora-copr/copr/issues/3433 + # https://github.com/rpm-software-management/mock/issues/1066 + echo "OpenMandriva are known to be broken. Skipping." + exit 0 rlPhaseEnd rlPhaseStartTest From d4aee885ab0d80043ecbfce7ab7920e76279d962 Mon Sep 17 00:00:00 2001 From: Jakub Kadlcik Date: Sun, 1 Dec 2024 11:33:54 +0100 Subject: [PATCH 4/6] beaker-tests-sanity: respect COPR_CLEANUP=false directive --- .../Sanity/copr-cli-basic-operations/runtest-createrepo.sh | 2 +- .../Sanity/copr-cli-basic-operations/runtest-storage.sh | 3 +-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/beaker-tests/Sanity/copr-cli-basic-operations/runtest-createrepo.sh b/beaker-tests/Sanity/copr-cli-basic-operations/runtest-createrepo.sh index 81cd49174..dd3b68eee 100755 --- a/beaker-tests/Sanity/copr-cli-basic-operations/runtest-createrepo.sh +++ b/beaker-tests/Sanity/copr-cli-basic-operations/runtest-createrepo.sh @@ -34,7 +34,7 @@ rlJournalStart rlPhaseEnd rlPhaseStartCleanup - rlRun "copr-cli delete ${NAME_PREFIX}Createrepo" + cleanProject "${NAME_PREFIX}Createrepo" rlRun "dnf -y copr remove ${URL}/${NAME_PREFIX}Createrepo" rlPhaseEnd rlJournalPrintText diff --git a/beaker-tests/Sanity/copr-cli-basic-operations/runtest-storage.sh b/beaker-tests/Sanity/copr-cli-basic-operations/runtest-storage.sh index cdec608cd..94e414e9a 100755 --- a/beaker-tests/Sanity/copr-cli-basic-operations/runtest-storage.sh +++ b/beaker-tests/Sanity/copr-cli-basic-operations/runtest-storage.sh @@ -47,12 +47,11 @@ rlJournalStart rlAssertRpm "hello" rlRun "dnf remove hello -y" rlRun "yes | dnf copr remove $DNF_COPR_ID/$project" - - rlRun "copr-cli delete $project" done rlPhaseEnd rlPhaseStartCleanup + cleanProject "$project" workdirCleanup rlPhaseEnd rlJournalPrintText From 94875c359b53f25ae8c72152cca0528d8d63ed47 Mon Sep 17 00:00:00 2001 From: Pavel Raiskup Date: Sun, 1 Dec 2024 17:01:49 +0100 Subject: [PATCH 5/6] beaker: fix sleeping in runtest-createrepo The first sleep is useless with DNF5, the mechanism (unlike DNF4) doesn't even wait for the backend action processing. The second sleep needs to be longer. The API call is relatively expensive, and therefore cached for 120s. We need to wait for cache invalidation. --- .../copr-cli-basic-operations/runtest-createrepo.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/beaker-tests/Sanity/copr-cli-basic-operations/runtest-createrepo.sh b/beaker-tests/Sanity/copr-cli-basic-operations/runtest-createrepo.sh index dd3b68eee..107d656e5 100755 --- a/beaker-tests/Sanity/copr-cli-basic-operations/runtest-createrepo.sh +++ b/beaker-tests/Sanity/copr-cli-basic-operations/runtest-createrepo.sh @@ -16,8 +16,6 @@ rlJournalStart rlPhaseStartTest rlRun "copr-cli create ${NAME_PREFIX}Createrepo --chroot $CHROOT" - echo "sleep 60 seconds to give backend enough time to generate the repo" - sleep 60 # don't specify chroot here, rely on auto-detection rlRun "dnf -y copr enable ${URL}/${NAME_PREFIX}Createrepo" rlRun "dnf --disablerepo='*' \ @@ -25,8 +23,10 @@ rlJournalStart list available 2>&1 | grep 'Failed to synchronize'" 1 rlRun "copr-cli modify ${NAME_PREFIX}Createrepo --chroot fedora-rawhide-x86_64" - echo "sleep 60 seconds to give backend enough time to generate the repo" - sleep 60 + + echo "wait 2+ minutes to invalidate cache" + echo "https://github.com/fedora-copr/copr/blob/526473b43b5e0c1f84f7db624f349a50a8e2b7d9/frontend/coprs_frontend/coprs/views/apiv3_ns/apiv3_rpmrepo.py#L37" + sleep 125 rlRun "dnf -y copr enable ${URL}/${NAME_PREFIX}Createrepo fedora-rawhide-x86_64" rlRun "dnf --disablerepo='*' \ --enablerepo='copr:${URL}:$(repo_owner):${NAME_VAR}Createrepo' \ From 2113da2e2efbb4791ed2350c227a3087408d1286 Mon Sep 17 00:00:00 2001 From: Pavel Raiskup Date: Sun, 1 Dec 2024 19:08:41 +0100 Subject: [PATCH 6/6] beaker: make sure we grep the right project --- beaker-tests/Sanity/copr-cli-basic-operations/runtest.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/beaker-tests/Sanity/copr-cli-basic-operations/runtest.sh b/beaker-tests/Sanity/copr-cli-basic-operations/runtest.sh index a004eeb50..dad38c6fb 100755 --- a/beaker-tests/Sanity/copr-cli-basic-operations/runtest.sh +++ b/beaker-tests/Sanity/copr-cli-basic-operations/runtest.sh @@ -313,9 +313,9 @@ rlJournalStart # test unlisted_on_hp project attribute rlRun "copr-cli create --unlisted-on-hp on --chroot $CHROOT ${NAME_PREFIX}Project7" - rlRun "curl $FRONTEND_URL --silent | grep Project7" 1 # project won't be present on hp + rlRun "curl $FRONTEND_URL --silent | grep ${NAME_PREFIX}Project7" 1 # project won't be present on hp rlRun "copr-cli modify --unlisted-on-hp off ${NAME_PREFIX}Project7" - rlRun "curl $FRONTEND_URL --silent | grep Project7" 0 # project should be visible on hp now + rlRun "curl $FRONTEND_URL --silent | grep ${NAME_PREFIX}Project7" 0 # project should be visible on hp now # FIXME It is now not possible to update whoosh index on demand # Instead, it is periodically recreated via cron