Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move infra from Fedora 39 to 41 #3460

Closed
praiskup opened this issue Oct 7, 2024 · 12 comments
Closed

Move infra from Fedora 39 to 41 #3460

praiskup opened this issue Oct 7, 2024 · 12 comments
Assignees

Comments

@praiskup
Copy link
Member

praiskup commented Oct 7, 2024

Don't forget to go through hot-fixes:
https://github.com/fedora-copr/copr/issues?q=label%3Ahot-fixed+

@praiskup praiskup converted this from a draft issue Oct 7, 2024
@praiskup praiskup changed the title Move infra to F41 Move infra from Fedora 39 to 41 Oct 7, 2024
@nikromen nikromen moved this from Needs triage to In 3 months in CPT Kanban Oct 16, 2024
@FrostyX
Copy link
Member

FrostyX commented Nov 1, 2024

After the update, make sure the iptables-legacy wasn't pulled back in (see #3439)

@nikromen nikromen moved this from In 3 months to In Progress in CPT Kanban Nov 18, 2024
FrostyX added a commit to FrostyX/copr that referenced this issue Nov 23, 2024
See fedora-copr#3460

Several issues appeared during the upgrade, and we are addressing them here:

- To my dismay, `python3-ipdb` is not available on F41.
  See https://bugzilla.redhat.com/show_bug.cgi?id=2280968
  Hopefully, the upstream issue will get fixed and the pacakge will get
  unretired in Fedora. I love using it.
- The `dist-git-client` is not a part of Copr codebase anymore and it was moved
  to dist-git. See https://github.com/release-engineering/dist-git
  As a consequence, the name of the configuration directory changed.
- Removing mlocate. We discussed it several times and IIRC we don't need it

Also, please recreate the whole environment through
`docker-compose down --rmi all --volumes` and `docker-compose up -d`.
nikromen pushed a commit that referenced this issue Nov 25, 2024
See #3460

Several issues appeared during the upgrade, and we are addressing them here:

- To my dismay, `python3-ipdb` is not available on F41.
  See https://bugzilla.redhat.com/show_bug.cgi?id=2280968
  Hopefully, the upstream issue will get fixed and the pacakge will get
  unretired in Fedora. I love using it.
- The `dist-git-client` is not a part of Copr codebase anymore and it was moved
  to dist-git. See https://github.com/release-engineering/dist-git
  As a consequence, the name of the configuration directory changed.
- Removing mlocate. We discussed it several times and IIRC we don't need it

Also, please recreate the whole environment through
`docker-compose down --rmi all --volumes` and `docker-compose up -d`.
@FrostyX
Copy link
Member

FrostyX commented Nov 26, 2024

We are currently working on upgrading the STG instances

@FrostyX
Copy link
Member

FrostyX commented Nov 27, 2024

The package golang-github-prometheus-node-exporter is now dead in Fedora. I tried to rebuild it in Copr but had some issues with. Even for F39 which should still support it, so I am not sure if it could work for F41 or not. Temporarily removing it from the playbooks:
https://pagure.io/fedora-infra/ansible/c/476d71fb8acc639c254015898e296347a4289ccb

@praiskup
Copy link
Member Author

We need that package for the prometheus monitoring in Red Hat network (grafana metrics). I think the package was renamed to node-exporter, per changelog:

* so led 13 2024 Mikel Olasagasti Uranga <[email protected]> - 1.6.1-1
- Initial package, rename from golang-github-prometheus-node-exporter -
  Closes rhbz#2250145

@FrostyX
Copy link
Member

FrostyX commented Nov 27, 2024

When upgrading the database via reindexdb --all, it spammed the output with hundreds of lines like this:

WARNING:  database "coprdb" has a collation version mismatch
DETAIL:  The database was created using collation version 2.38, but the operating system provides version 2.40.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE coprdb REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

I did as suggested, to fix the warnings:

bash-5.2$ psql coprdb
coprdb=# ALTER DATABASE coprdb REFRESH COLLATION VERSION;
coprdb=# ALTER DATABASE postgres REFRESH COLLATION VERSION;
coprdb=# ALTER DATABASE template1 REFRESH COLLATION VERSION;
bash-5.2$ reindexdb --all

No warnings now.

@FrostyX
Copy link
Member

FrostyX commented Nov 27, 2024

A lot of errors in the httpd log:

[root@copr-fe-dev ~][STG]# tail -f /var/log/httpd/error_log
[Wed Nov 27 21:10:13.341316 2024] [wsgi:error] [pid 128272:tid 128419] Exception ignored in: <_io.TextIOWrapper name='<wsgi.errors>' encoding='utf-8'>
[Wed Nov 27 21:10:13.341416 2024] [wsgi:error] [pid 128272:tid 128419] RuntimeError: log object has expired
[Wed Nov 27 21:10:16.610025 2024] [wsgi:error] [pid 128275:tid 128445] [remote 2600:1f18:8ee:ae00:6c8c:e094:1c5b:c2f9:36692] INFO:coprs:Generating SRPM builds
[Wed Nov 27 21:10:16.690663 2024] [wsgi:error] [pid 128275:tid 128439] Exception ignored in: <_io.TextIOWrapper name='<wsgi.errors>' encoding='utf-8'>
[Wed Nov 27 21:10:16.690821 2024] [wsgi:error] [pid 128275:tid 128439] RuntimeError: log object has expired

It seems that this is caused by mod_wsgi 5.0.1 and fixed by mod_wsgi 5.0.2, see GrahamDumpleton/mod_wsgi#912

Temporarily, I built the new upstream version in Copr and it indeed fixes the issue.

dnf copr enable frostyx/mod-wsgi
dnf update python3-mod_wsgi
systemctl restart httpd

I proposed a change to the Fedora package, hopefuly it will make it before we upgrade production
https://src.fedoraproject.org/rpms/mod_wsgi/pull-request/18

Update: The mentioned Copr repo is not needed anymore, the package is in the repos (testing) now
https://bodhi.fedoraproject.org/updates/FEDORA-2024-235cc1f0b3

@praiskup
Copy link
Member Author

praiskup commented Nov 28, 2024

vgchange -a y was needed on be-dev edit: https://pagure.io/fedora-infra/ansible/c/1073f70bce4d60dc2a40ebde2ec1f65de4ee2c3a

@FrostyX
Copy link
Member

FrostyX commented Nov 28, 2024

Hotfixes in production:

Frontend:

--- /usr/share/copr/coprs_frontend/coprs/logic/actions_logic.py 2024-10-03 00:00:00.000000000 +0000
+++ /usr/share/copr/coprs_frontend/coprs/logic/actions_logic.py 2024-11-24 23:23:05.087732871 +0000
@@ -224,6 +224,7 @@
             # We can pick any random build because the assumption is, they are
             # all from the same project
             "storage": builds[0].copr.storage if builds else None,
+            "devel": builds[0].copr.devel_mode,
         }

         build_ids = []
Binary files ./usr/share/copr/coprs_frontend/coprs/views/apiv3_ns/__pycache__/apiv3_projects.cpython-312.pyc and /usr/share/copr/coprs_frontend/coprs/views/apiv3_ns/__pycache__/apiv3_projects.cpython-312.pyc differ
--- /usr/share/copr/coprs_frontend/coprs/views/apiv3_ns/apiv3_projects.py       2024-10-03 00:00:00.000000000 +0000
+++ /usr/share/copr/coprs_frontend/coprs/views/apiv3_ns/apiv3_projects.py       2024-10-10 10:58:51.654525740 +0000
@@ -242,7 +242,6 @@
                 selected_chroots=form.selected_chroots,
                 description=form.description.data,
                 instructions=form.instructions.data,
-                check_for_duplicates=True,
                 unlisted_on_hp=form.unlisted_on_hp.data,
                 build_enable_net=form.enable_net.data,
                 group=group,

Backend:

--- /usr/bin/copr-backend-resultdir-cleaner     2024-10-22 00:00:00.000000000 +0000
+++ /usr/bin/copr-backend-resultdir-cleaner     2024-11-22 13:12:28.523835714 +0000
@@ -1,7 +1,7 @@
 #! /usr/bin/python3

 """
-Cleanup the old chroot_scan folders
+Cleanup the files in resultdir that are no longer needed.
 """

 import logging
@@ -17,19 +17,23 @@
 from copr_backend.helpers import BackendConfigReader


-logging.basicConfig(level=logging.DEBUG)
 LOG = logging.getLogger(__name__)
-setup_script_logger(LOG, "/var/log/copr-backend/resultdir-cleaner.log")
-
 OLDER_THAN = time.time() - 24*3600*14

-parser = argparse.ArgumentParser(
-    description=("TBD")
-)
-parser.add_argument(
-    "--real-run",
-    action='store_true',
-    help=("Also perform the changes, not just checks"))
+
+def _get_arg_parser():
+    parser = argparse.ArgumentParser(
+        description=(
+            "Traverse the Copr Backend result directory and remove things "
+            "that are no longer needed → outdated log files, not uncleaned "
+            "temporary directories, etc."))
+    parser.add_argument(
+        "--real-run",
+        action='store_true',
+        help=(
+            "Perform the real removals (by default the tool just prints "
+            "what would normally happen = \"dry run\")."))
+    return parser


 def remove_old_dir(directory, dry_run):
@@ -69,6 +73,10 @@
             # to the [backend] prune_days=N config.
             continue

+        if os.path.basename(chroot_dir) == "modules":
+            todo_directory(chroot_dir, "MODULES")
+            continue
+
         for subdir in chroot_subdirs:
             chroot_subdir_path = os.path.join(chroot_dir, subdir)

@@ -162,14 +170,15 @@


 def _main():
-    args = parser.parse_args()
+    logging.basicConfig(level=logging.DEBUG)
+    config_file = os.environ.get("BACKEND_CONFIG", "/etc/copr/copr-be.conf")
+    opts = BackendConfigReader(config_file).read()
+    setup_script_logger(LOG, os.path.join(opts["log_dir"], "resultdir-cleaner.log"))
+    args = _get_arg_parser().parse_args()
     dry_run = not args.real_run
     if dry_run:
         LOG.warning("Just doing dry run, run with --real-run")

-    config_file = os.environ.get("BACKEND_CONFIG", "/etc/copr/copr-be.conf")
-    opts = BackendConfigReader(config_file).read()
-
     clean_in(opts.destdir, dry_run=dry_run)


--- /usr/lib/python3.12/site-packages/copr_backend/pulp.py      2024-10-22 00:00:00.000000000 +0000
+++ /usr/lib/python3.12/site-packages/copr_backend/pulp.py      2024-10-29 18:42:11.176004437 +0000
@@ -50,6 +50,14 @@
         """
         return (self.config["username"], self.config["password"])

+    @property
+    def cert(self):
+        """
+        See Client Side Certificates
+        https://docs.python-requests.org/en/latest/user/advanced/
+        """
+        return (self.config["cert"], self.config["key"])
+
     def url(self, endpoint):
         """
         A fully qualified URL for a given API endpoint
@@ -74,7 +82,12 @@
         """
         Default parameters for our requests
         """
-        return {"auth": self.auth, "timeout": self.timeout}
+        params = {"timeout": self.timeout}
+        if all(self.cert):
+            params["cert"] = self.cert
+        else:
+            params["auth"] = self.auth
+        return params

     def create_repository(self, name):
         """

Save the patch to a file and then apply with:

vim /tmp/hotfix.patch
sed -i /tmp/hotfix.patch -e "s/python3.12/python3.13/"
patch -d / -p 0 < /tmp/hotfix.patch

@FrostyX
Copy link
Member

FrostyX commented Dec 1, 2024

I think there is some slowness related to CDN. This test fails

rlRun "copr-cli modify ${NAME_PREFIX}Createrepo --chroot fedora-rawhide-x86_64"
echo "sleep 60 seconds to give backend enough time to generate the repo"
sleep 60
rlRun "dnf -y copr enable ${URL}/${NAME_PREFIX}Createrepo fedora-rawhide-x86_64"

but if I increase the sleep time to e.g. 180, it works fine. At the same time, I can see the repodata almost immediatelly when using ls on backend.

@praiskup
Copy link
Member Author

praiskup commented Dec 2, 2024

@FrostyX
Copy link
Member

FrostyX commented Dec 4, 2024

Some post-release things we should ideally address

  • The ansible-fedora-copr playbooks should check that all dependencies are available on the local system. When the playbook fails somewhere in the middle, we end up in an inconsistent state, and we need to comment out parts of the playbook so it can run again. Too dangerous. I got two failures:
    • Missing fedora-copr section in ~/.aws/credentials
    • /usr/bin/aws not being installed on my system
  • The chrony package is probably baked in the cloud AMI so we fail when creating our user on keygen. Possible solutions would be changing our UID to higher number, or uninstalling the package in our playbooks, remving the user from /etc/passwd, then creating our user, and then possibly installing the chrony package back
  • On copr-distgit we unexpectedly got stuck when running /usr/bin/copr-dist-git-refresh-cgit through playbooks. It an take even an hour to finish, so we had to temporarily put exit 0 at the start of the script so that playbooks go through quicker. This won't happen on copr-distgit-dev because we don't have as much data there

@FrostyX
Copy link
Member

FrostyX commented Dec 10, 2024

The ansible-fedora-copr playbooks should check that all dependencies are available on the local system

This wasn't trivial to implement, so I only documented the requirements in #3533

The chrony package is probably baked in the cloud AMI

Reported in #3553

On copr-distgit we unexpectedly got stuck when running /usr/bin/copr-dist-git-refresh-cgit

Reported in #3554

@nikromen nikromen moved this from In Progress to Done in CPT Kanban Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants