Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osbuild does not produce images with populated dnf state database #455

Open
Conan-Kudo opened this issue Jun 22, 2020 · 29 comments · Fixed by #1333
Open

osbuild does not produce images with populated dnf state database #455

Conan-Kudo opened this issue Jun 22, 2020 · 29 comments · Fixed by #1333
Assignees
Labels
bug A description of a clear misbehavior of the application, which needs to be fixed.

Comments

@Conan-Kudo
Copy link

Conan-Kudo commented Jun 22, 2020

Since #328, osbuild has split its software installation into two stages: sources for fetching content using DNF, and rpm for installing them into the target image environment. This would be fine, except... the rpm stage doesn't use DNF to install.

This is actually a problem, since it means that the generated images now lack the DNF state database information that is used later for providing information to make intelligent decisions with the system software in future transactions. For example, the lack of any state information means that dnf autoremove is fundamentally broken and will always do the wrong thing, since we don't have packages installed via DNF so that things are marked as user-installed or dep-installed accordingly.

Additionally, if modular content is installed this way, we now have a situation where DNF is broken in the target image because the failsafe mechanism that was requested for RHEL modules will cause DNF to choke since there will be a situation where you have "modular" packages installed without the corresponding module metadata.

Of course, if you're producing images with no package manager, then this isn't a problem. Or if you aren't using modular content, then the damage is limited. But if you're building custom RHEL 8 images, then this is a problem.

Now, reading back through the history of why this happened, it looks like the goal was to avoid requiring network access for the build stages, presumably to provide a mechanism in which all the inputs could be archived and replayed to generate the same image reproducibly. This is definitely an admirable goal.

My suggestion would be to do the following:

  1. At the sources stage, depsolve for the content you need, and fetch it. Then generate a rpm-md repository.
  2. At the package install stage, configure DNF to use that particular local offline repository you've made (with module_hotfixes=1 so modular packages install), and use it to install software as requested, rather than taking the pile of RPMs and doing the installation by hand.

This strategy is actually how offline appliance-tools/livecd-tools and kiwi image builds are often done. You can just make that process automatic with osbuild.

Now this doesn't solve all the problems, since there's still the pesky issue of dealing with modular packages. One possible option would be to reposync the module out and merge that into your local repository's metadata. That would allow it to function the same way it does on a normal system, and have the correct tracking information so that the package manager works properly.

I'm open to ideas here, but the current way osbuild installs software into an image leads to images that potentially won't work as users expect them to.

@teg
Copy link
Member

teg commented Jun 22, 2020

Thanks for trying out osbuild and for providing feedback.

First a minor correction: we split the regular DNF transaction in three: depsolving (libdnf), fetching (curl/librepo) and installing (rpm). osbuild handles the latter two, but depsolving is done externally in order to produce the manifest. The main reason for this is that image building should be deterministic, so we need to pin the content hashes of all our inputs.

dnf autoremove is fundamentally broken and will always do the wrong thing

We discussed this with the dnf team, and our understanding is that the current behaviour is arguably correct. The argument could also be made that we should explicitly mark some packages, I'd be happy to discuss that.

I certainly do not agree that the wrong thing will always happen. dnf autoremove does not remove any packages from a fresh image. But if you install a package manually and then remove it again, the newly installed dependencies will be removed too (though any packages part of the initial image will not).

Do you have some examples of behaviour you think shows that dnf autoremove is fundamentally broken on an osbuild-created image compared to one of the official RHEL/Fedora images?

the current way osbuild installs software into an image is justifiably insane

Let's not get carried away.

@Conan-Kudo
Copy link
Author

Thanks for trying out osbuild and for providing feedback.

First a minor correction: we split the regular DNF transaction in three: depsolving (libdnf), fetching (curl/librepo) and installing (rpm). osbuild handles the latter two, but depsolving is done externally in order to produce the manifest. The main reason for this is that image building should be deterministic, so we need to pin the content hashes of all our inputs.

That seems flawed in practice. It only works as long as all the content you used always remains available. Within the Red Hat ecosystem, this isn't true on Fedora or CentOS. It's technically not true on RHEL either if you work with the default repositories. The SUSE ecosystem is a bit better with how they handle service pack/point release updates for SLE and openSUSE Leap, but this still eventually becomes a problem there. And of course openSUSE Tumbleweed is rolling, so...

dnf autoremove is fundamentally broken and will always do the wrong thing

We discussed this with the dnf team, and our understanding is that the current behaviour is arguably correct. The argument could also be made that we should explicitly mark some packages, I'd be happy to discuss that.

I certainly do not agree that the wrong thing will always happen. dnf autoremove does not remove any packages from a fresh image. But if you install a package manually and then remove it again, the newly installed dependencies will be removed too (though any packages part of the initial image will not).

Do you have some examples of behaviour you think shows that dnf autoremove is fundamentally broken on an osbuild-created image compared to one of the official RHEL/Fedora images?

So, there's a few issues with this: if packages were explicitly requested by the user (which could be a library package that also ships a tool, since that's common in RH/Fedora), then if another application is uninstalled that required it and DNF considered with no other things requiring it, it gets removed.

This has very real consequences. Packages like libcap fall into this bucket and can be autoremoved and break things.

If you're not willing to use DNF in offline mode to install the requested packages to populate the information correctly, you should at least use dnf mark to simulate the correct setup and mark the user-installed and dep-installed content properly. That will require a bit more work to make sure you figure out what to mark, but it's doable.

the current way osbuild installs software into an image is justifiably insane

Let's not get carried away.

Sorry, I'm just frustrated. This is one of these things that I do a lot of work in, both professionally and personally, and I've explored more than my fair share of tools and methods on doing it. I expected that osbuild would wind up doing this better than lorax did (which I personally disliked because the idea of using an installer for building images just adds a huge new dimension of problems, which thankfully other people finally noticed...).

I also note, we still don't have an answer here for modules...

@dvdhrm
Copy link
Contributor

dvdhrm commented Jun 24, 2020

That seems flawed in practice. It only works as long as all the content you used always remains available. Within the Red Hat ecosystem, this isn't true on Fedora or CentOS. It's technically not true on RHEL either if you work with the default repositories. The SUSE ecosystem is a bit better with how they handle service pack/point release updates for SLE and openSUSE Leap, but this still eventually becomes a problem there. And of course openSUSE Tumbleweed is rolling, so...

There is no requirement for osbuild manifests to be valid for longer than necessary. The content-addressed model is used to provide strong guarantees on what data ends up in an image. It is a communication object between osbuild-manifest creators (e.g., osbuild-composer) and the osbuild pipeline engine. The fact that such manifests will be outdated (or have unavailable sources) at one point does not negate their applicability.
Obviously, without the updates repository and with just the release repositories the osbuild manifests can be used for much longer. But I do not see why short-lived manifests lead to issues. Manifests are, more often than not, generated on-demand and have no long lifetime whatsoever.

Can you elaborate why you think this is "flawed in practice"?

If you're not willing to use DNF in offline mode to install the requested packages to populate the information correctly, you should at least use dnf mark to simulate the correct setup and mark the user-installed and dep-installed content properly. That will require a bit more work to make sure you figure out what to mark, but it's doable.

This does not really respond to the situation Tom described, which is that we were told all packages are considered user installed if no DNF metadata is generated. If that is not true, please elaborate.

There is an argument to be made in favor of only marking a selected set of initial packages as user installed. We are aware of that, and we can easily do that by making dnf-json (in osbuild-composer) annotate the RPMs and then add a dnf mark stage to the resulting manifest (quoting Tom: "I'd be happy to discuss that.").

I would certainly be interested in a concrete example were the current model of osbuild fails.

I also note, we still don't have an answer here for modules...

Can you elaborate which particular problems you see?

You mentioned the failsafe mechanism, but we only use the default modules (and none of these have skip_if_unavailable set, right?). Therefore, the failsafe mechanism would only be required if someone explicitly removes the default repositories (to my knowledge, this is not a supported use-case).
Once we allow selecting other modules, we will need additional stages. These will use dnf to enable particular repositories, and these will be required to copy the module-metadata into the dnf-database to guarantee it's available when the repository vanishes for whatever reason.

Similar to the dnf mark issue, I would be very happy if you can provide concrete examples where the current model fails.

@Conan-Kudo
Copy link
Author

That seems flawed in practice. It only works as long as all the content you used always remains available. Within the Red Hat ecosystem, this isn't true on Fedora or CentOS. It's technically not true on RHEL either if you work with the default repositories. The SUSE ecosystem is a bit better with how they handle service pack/point release updates for SLE and openSUSE Leap, but this still eventually becomes a problem there. And of course openSUSE Tumbleweed is rolling, so...

There is no requirement for osbuild manifests to be valid for longer than necessary. The content-addressed model is used to provide strong guarantees on what data ends up in an image. It is a communication object between osbuild-manifest creators (e.g., osbuild-composer) and the osbuild pipeline engine. The fact that such manifests will be outdated (or have unavailable sources) at one point does not negate their applicability.
Obviously, without the updates repository and with just the release repositories the osbuild manifests can be used for much longer. But I do not see why short-lived manifests lead to issues. Manifests are, more often than not, generated on-demand and have no long lifetime whatsoever.

Can you elaborate why you think this is "flawed in practice"?

If manifests are not useful beyond the build process, there is no point in generating them. Full stop. Your existing set of inputs for your build model implies that it's possible to make reproducible image builds. However, you are (correctly) saying that this is functionally impossible in this ticket.

The way your inputs work essentially mislead users into thinking it's capable of more than it actually is. If you do not intend to support enforced version locking with reproducible inputs, then don't include a way to make people think that you can do it. Your thought process about manifests is completely the opposite of how every other system treats them, and so should not exist.

If you're not willing to use DNF in offline mode to install the requested packages to populate the information correctly, you should at least use dnf mark to simulate the correct setup and mark the user-installed and dep-installed content properly. That will require a bit more work to make sure you figure out what to mark, but it's doable.

This does not really respond to the situation Tom described, which is that we were told all packages are considered user installed if no DNF metadata is generated. If that is not true, please elaborate.

There is an argument to be made in favor of only marking a selected set of initial packages as user installed. We are aware of that, and we can easily do that by making dnf-json (in osbuild-composer) annotate the RPMs and then add a dnf mark stage to the resulting manifest (quoting Tom: "I'd be happy to discuss that.").

This is true up to a point. However, the behavior for dnf autoremove is wonky when the DNF database isn't populated, and users have historically complained about leaves being unexpectedly removed because of this in the past with PackageKit. That's why we try to make sure the DNF database is correctly populated with Lorax, LiveCD Tools, KIWI, and other image building tools.

I would certainly be interested in a concrete example were the current model of osbuild fails.

I also note, we still don't have an answer here for modules...

Can you elaborate which particular problems you see?

You mentioned the failsafe mechanism, but we only use the default modules (and none of these have skip_if_unavailable set, right?). Therefore, the failsafe mechanism would only be required if someone explicitly removes the default repositories (to my knowledge, this is not a supported use-case).
Once we allow selecting other modules, we will need additional stages. These will use dnf to enable particular repositories, and these will be required to copy the module-metadata into the dnf-database to guarantee it's available when the repository vanishes for whatever reason.

Similar to the dnf mark issue, I would be very happy if you can provide concrete examples where the current model fails.

My professional interest in OSBuild is only insofar in that I expect it to support modularity properly. My personal interest is in OSBuild to simplify the Fedora image building processes. In both cases, I need both default and non-default modules to work properly for image builds. And that the resulting images aren't fundamentally broken. Right now, it would be a bad idea to use OSBuild even with default modules, because the resulting image is completely broken for ongoing usage.

Because you install software in the wrong way with OSBuild, there is no way I can trust that my image is any good for production use. If I apply configuration management to a long-running instance from this image, or if I provision a bare metal system from an image built by this system, I would expect package management to work. That will definitely not be the case with RHEL, and may not be the case with Fedora.

@teg
Copy link
Member

teg commented Aug 31, 2020

If manifests are not useful beyond the build process, there is no point in generating them. Full stop.

Just because you have not understood something does not mean there is no possible reason for it to exist. So at the very least statements like this comes across as overconfident, and takes away from the rest of what you have to say.

I think the discussion would be more productive if you could point to practical problems you have found, ideally with instructions on how to reproduce them.

@Conan-Kudo
Copy link
Author

If manifests are not useful beyond the build process, there is no point in generating them. Full stop.

Just because you have not understood something does not mean there is no possible reason for it to exist. So at the very least statements like this comes across as overconfident, and takes away from the rest of what you have to say.

I understand the value of intermediate artifacts, but I do not believe it makes any sense to expose them to people like you want to. The confusion that it will cause was one thing I did point out, and you don't seem to have an answer for that.

@teg
Copy link
Member

teg commented Aug 31, 2020

If manifests are not useful beyond the build process, there is no point in generating them. Full stop.

Just because you have not understood something does not mean there is no possible reason for it to exist. So at the very least statements like this comes across as overconfident, and takes away from the rest of what you have to say.

I understand the value of intermediate artifacts, but I do not believe it makes any sense to expose them to people like you want to. The confusion that it will cause was one thing I did point out, and you don't seem to have an answer for that.

You are right that the potential for confusion is something we must be aware of. In particular when/if these things are exposed in high-level tools.

I'd be happy to discuss high-level design decisions like that, but I don't think this is the right forum.

I am much more interested in your expertise on modularity and any issues you can actually point to there. We think we have our bases covered, but issues with reproducers would be very greatly appreciated.

@Conan-Kudo
Copy link
Author

I'd be happy to discuss high-level design decisions like that, but I don't think this is the right forum.

You have no other forum, so this seems pretty difficult for me to act on.

@teg
Copy link
Member

teg commented Aug 31, 2020

I'd be happy to discuss high-level design decisions like that, but I don't think this is the right forum.

You have no other forum, so this seems pretty difficult for me to act on.

Feel free to open dedicated issues :)

@dvdhrm
Copy link
Contributor

dvdhrm commented Sep 1, 2020

There is no requirement for osbuild manifests to be valid for longer than necessary. The content-addressed model is used to provide strong guarantees on what data ends up in an image. It is a communication object between osbuild-manifest creators (e.g., osbuild-composer) and the osbuild pipeline engine. The fact that such manifests will be outdated (or have unavailable sources) at one point does not negate their applicability.
Obviously, without the updates repository and with just the release repositories the osbuild manifests can be used for much longer. But I do not see why short-lived manifests lead to issues. Manifests are, more often than not, generated on-demand and have no long lifetime whatsoever.
Can you elaborate why you think this is "flawed in practice"?

If manifests are not useful beyond the build process, there is no point in generating them. Full stop.

You come here, commenting on a public open-source project, and telling its maintainers that there is no point in the project they do, "Full stop.". I find this rude and appalling and do not appreciate conversation in that tone. If you do not want to listen to arguments from our side ("Full stop."), this argument becomes tedious.

Your existing set of inputs for your build model implies that it's possible to make reproducible image builds. However, you are (correctly) saying that this is functionally impossible in this ticket.

I did not say that. The osbuild engine can build all kinds of artifacts, and is not limited to Fedora release images. The fact that Fedora update repositories are ephemeral is a restriction of Fedora, not of osbuild.
Secondly, and I repeat myself, reproducibility does not necessarily imply infinite availability. The content-addressed manifest allows us to reason about image-builds simply based on the content of the manifest. It allows us to distribute image-builds without the need to verify signatures on each build machine. It allows us to cache intermediate artifacts without sacrificing coherency.

And, again, osbuild is designed to allow building more artifacts than just Fedora images (it is not even limited to OS Images).

The way your inputs work essentially mislead users into thinking it's capable of more than it actually is. If you do not intend to support enforced version locking with reproducible inputs, then don't include a way to make people think that you can do it.

We do intend to support "enforced version locking".

Your thought process about manifests is completely the opposite of how every other system treats them, and so should not exist.

I joined this project because it does not align with the status quo, because it tries something new. I appreciate that. I enjoy thinking out of the box, denying the ordinary, walking where others refuse to go.

I completely disagree with the sentiment of your statement.

This does not really respond to the situation Tom described, which is that we were told all packages are considered user installed if no DNF metadata is generated. If that is not true, please elaborate.
There is an argument to be made in favor of only marking a selected set of initial packages as user installed. We are aware of that, and we can easily do that by making dnf-json (in osbuild-composer) annotate the RPMs and then add a dnf mark stage to the resulting manifest (quoting Tom: "I'd be happy to discuss that.").

This is true up to a point. However, the behavior for dnf autoremove is wonky when the DNF database isn't populated, and users have historically complained about leaves being unexpectedly removed because of this in the past with PackageKit. That's why we try to make sure the DNF database is correctly populated with Lorax, LiveCD Tools, KIWI, and other image building tools.

I am sorry, but this is quite vague. How am I supposed to test a failing dnf database, if I cannot reproduce one? I previously asked you, and I have to repeat: I would certainly be interested in a concrete example were the current model of osbuild fails.

My professional interest in OSBuild is only insofar in that I expect it to support modularity properly. My personal interest is in OSBuild to simplify the Fedora image building processes. In both cases, I need both default and non-default modules to work properly for image builds. And that the resulting images aren't fundamentally broken. Right now, it would be a bad idea to use OSBuild even with default modules, because the resulting image is completely broken for ongoing usage.
Because you install software in the wrong way with OSBuild, there is no way I can trust that my image is any good for production use. If I apply configuration management to a long-running instance from this image, or if I provision a bare metal system from an image built by this system, I would expect package management to work. That will definitely not be the case with RHEL, and may not be the case with Fedora.

Can you state a single example were a current osbuild manifest with default modules is "completely broken for ongoing usage"?

You repeatedly claim complete brokenness and definite unfitness of osbuild, while lacking any concreteness in your descriptions. It makes it hard for me to take this seriously, and makes me wonder what your intention of this inquiry is. I would very much appreciate suggestions what parts to improve, and how. I would appreciate if you link to broken manifests, or broken builds. But if your feedback aims to call osbuild "completely broken", to shutdown arguments with "Full stop", and to assert dissidents "should not exist", then I fail to see value in this discussion.

@Conan-Kudo
Copy link
Author

There is no requirement for osbuild manifests to be valid for longer than necessary. The content-addressed model is used to provide strong guarantees on what data ends up in an image. It is a communication object between osbuild-manifest creators (e.g., osbuild-composer) and the osbuild pipeline engine. The fact that such manifests will be outdated (or have unavailable sources) at one point does not negate their applicability.
Obviously, without the updates repository and with just the release repositories the osbuild manifests can be used for much longer. But I do not see why short-lived manifests lead to issues. Manifests are, more often than not, generated on-demand and have no long lifetime whatsoever.
Can you elaborate why you think this is "flawed in practice"?

If manifests are not useful beyond the build process, there is no point in generating them. Full stop.

You come here, commenting on a public open-source project, and telling its maintainers that there is no point in the project they do, "Full stop.". I find this rude and appalling and do not appreciate conversation in that tone. If you do not want to listen to arguments from our side ("Full stop."), this argument becomes tedious.

Your existing set of inputs for your build model implies that it's possible to make reproducible image builds. However, you are (correctly) saying that this is functionally impossible in this ticket.

I did not say that. The osbuild engine can build all kinds of artifacts, and is not limited to Fedora release images. The fact that Fedora update repositories are ephemeral is a restriction of Fedora, not of osbuild.
Secondly, and I repeat myself, reproducibility does not necessarily imply infinite availability. The content-addressed manifest allows us to reason about image-builds simply based on the content of the manifest. It allows us to distribute image-builds without the need to verify signatures on each build machine. It allows us to cache intermediate artifacts without sacrificing coherency.

And, again, osbuild is designed to allow building more artifacts than just Fedora images (it is not even limited to OS Images).

The way your inputs work essentially mislead users into thinking it's capable of more than it actually is. If you do not intend to support enforced version locking with reproducible inputs, then don't include a way to make people think that you can do it.

We do intend to support "enforced version locking".

Your thought process about manifests is completely the opposite of how every other system treats them, and so should not exist.

I joined this project because it does not align with the status quo, because it tries something new. I appreciate that. I enjoy thinking out of the box, denying the ordinary, walking where others refuse to go.

I completely disagree with the sentiment of your statement.

If you are already intending to support them like lock files, then it's fine to have them. But your answers above seemed to indicate that you insist to generate lock files while you simultaneously know that they don't work the way people expect them. There's being different, and there's breaking people's expectations.

Also, it's not just Fedora where this doesn't work. Virtually all distributions have this problem, except for openSUSE Leap and SUSE Linux Enterprise, since those two don't have a rolling repository for the major version that is "reset" when a new point release is made.

This does not really respond to the situation Tom described, which is that we were told all packages are considered user installed if no DNF metadata is generated. If that is not true, please elaborate.
There is an argument to be made in favor of only marking a selected set of initial packages as user installed. We are aware of that, and we can easily do that by making dnf-json (in osbuild-composer) annotate the RPMs and then add a dnf mark stage to the resulting manifest (quoting Tom: "I'd be happy to discuss that.").

This is true up to a point. However, the behavior for dnf autoremove is wonky when the DNF database isn't populated, and users have historically complained about leaves being unexpectedly removed because of this in the past with PackageKit. That's why we try to make sure the DNF database is correctly populated with Lorax, LiveCD Tools, KIWI, and other image building tools.

I am sorry, but this is quite vague. How am I supposed to test a failing dnf database, if I cannot reproduce one? I previously asked you, and I have to repeat: I would certainly be interested in a concrete example were the current model of osbuild fails.

My professional interest in OSBuild is only insofar in that I expect it to support modularity properly. My personal interest is in OSBuild to simplify the Fedora image building processes. In both cases, I need both default and non-default modules to work properly for image builds. And that the resulting images aren't fundamentally broken. Right now, it would be a bad idea to use OSBuild even with default modules, because the resulting image is completely broken for ongoing usage.
Because you install software in the wrong way with OSBuild, there is no way I can trust that my image is any good for production use. If I apply configuration management to a long-running instance from this image, or if I provision a bare metal system from an image built by this system, I would expect package management to work. That will definitely not be the case with RHEL, and may not be the case with Fedora.

Can you state a single example were a current osbuild manifest with default modules is "completely broken for ongoing usage"?

What is a "current osbuild manifest"? The ones in your samples? Your samples are fine, and cockpit-composer doesn't expose the ability to install modular content, so you can't hit this problem in either one. Your manifest format is quite complex and hand-crafting one to expose the problem is not straightforward. I can trivially do it with a shell script that emulates osbuild behavior, but actually making the manifest is quite painful.

You repeatedly claim complete brokenness and definite unfitness of osbuild, while lacking any concreteness in your descriptions. It makes it hard for me to take this seriously, and makes me wonder what your intention of this inquiry is. I would very much appreciate suggestions what parts to improve, and how. I would appreciate if you link to broken manifests, or broken builds. But if your feedback aims to call osbuild "completely broken", to shutdown arguments with "Full stop", and to assert dissidents "should not exist", then I fail to see value in this discussion.

Look, osbuild doing something different is interesting. But that doesn't mean you should ignore the real-world usage requirements. Nor should you ignore the realities of the environment you're working in. The problem with osbuild is that it's actually a great concept, but some of the details just aren't handled right.

@teg
Copy link
Member

teg commented Sep 1, 2020

If you are already intending to support them like lock files, then it's fine to have them. But your answers above seemed to indicate that you insist to generate lock files while you simultaneously know that they don't work the way people expect them.

I suggest opening up a separate issue if you want to discuss this. Though I'm struggling to see where you are coming from here. It is true that being able to always rebuild manifests would be nice, it is also true that in many cases where we would like that, it is currently not possible. However, that is not the reason we have manifests, and I don't understand what practical problem their existence pose to you.

If you see a way to improve on this without breaking the properties we currently have and rely on, I'd be interested in hearing more about it.

The problem with osbuild is that it's actually a great concept, but some of the details just aren't handled right.

If you open up separate issues for each of the concerns you have I think that would lead to a better discussion. Though bear in mind that we have many considerations to bear in mind, so it is unlikely we will be able to give you exactly what you expect and no features you don't care about.

@cgwalters
Copy link
Contributor

Just for reference, rpm-ostree has a different model for "user installed" type data, xref https://blog.verbum.org/2020/08/22/immutable-%E2%86%92-reprovisionable-anti-hysteresis/
So this bug won't apply for osbuild generating rpm-ostree builds.

Also, rpm-ostree has had a lockfile implementation since this commit (migrated into Rust since then) which I think duplicates the osbuild locking.

Also, it's not just Fedora where this doesn't work. Virtually all distributions have this problem,

In Fedora CoreOS, we ship using lock files, and we added the "archive" repository for exactly this reason. I think there's been some discussion about expanding it beyond FCOS (because really, having exactly one version on mirrors makes no sense in a world of object stores and CDNs).

@Conan-Kudo
Copy link
Author

Right, so generally the lockfile in either osbuild or rpm-ostree is useless if you don't have something like the archive repository. And archive repositories are not going to be common because that requires a lot of money to maintain, which is unreasonable for most distributions or people to expect to have.

supakeen added a commit to supakeen/osbuild that referenced this issue Jun 27, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
supakeen added a commit to supakeen/osbuild that referenced this issue Jun 27, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
supakeen added a commit to supakeen/osbuild that referenced this issue Jun 27, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
@supakeen
Copy link
Member

supakeen commented Jun 27, 2023

It's been a while but I'll be diving into marking packages in the DNF state database as this needs to be resolved for the Fedora installer(s).

I'll likely be marking user requested packages (the top level packages in packageSet and blueprint-requested packages) as user-installed; all the rest as dependency but I'll be reading up on dnf mark for a bit first.

It also seems that modularity is proposed for removal in f39 which might simplify some things down the road.

supakeen added a commit to supakeen/osbuild that referenced this issue Jun 28, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
supakeen added a commit to supakeen/osbuild that referenced this issue Jun 28, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
@supakeen supakeen linked a pull request Jul 14, 2023 that will close this issue
@supakeen supakeen moved this to 🏗 In progress in Building Fedora Jul 14, 2023
@supakeen
Copy link
Member

supakeen commented Jul 14, 2023

Sorry, this comment was blatantly wrong (if you got an email about it); I was mixing up VM images before being sufficiently caffeinated.

Comment used to say that there are no user-marked packages on Fedora VMs/ISOs but in fact all kickstart-selected (and I believe anaconda and lorax-selected) packages are user-marked. This implies that we will be marking all top-level requested packages in either packageSet or blueprints as user.

Still figuring out groups.

@supakeen supakeen moved this from 🏗 In progress to 💬 Discussion in Building Fedora Jul 14, 2023
@supakeen supakeen linked a pull request Jul 14, 2023 that will close this issue
supakeen added a commit to supakeen/osbuild that referenced this issue Jul 24, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.

This stage conditionally selects `dnf5` or `dnf` semantics.
supakeen added a commit to supakeen/osbuild that referenced this issue Jul 24, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.

This stage conditionally selects `dnf5` or `dnf` semantics.
supakeen added a commit to supakeen/osbuild that referenced this issue Jul 24, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.

This stage conditionally selects `dnf5` or `dnf` semantics.
supakeen added a commit to supakeen/osbuild that referenced this issue Jul 24, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.

Two stages are provided, one for dnf-3 and one for dnf5.
supakeen added a commit to supakeen/osbuild that referenced this issue Aug 3, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.

Two stages are provided, one for dnf-3 and one for dnf5.
supakeen added a commit to supakeen/osbuild that referenced this issue Aug 3, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.

Two stages are provided, one for dnf-3 and one for dnf5.
supakeen added a commit to supakeen/osbuild that referenced this issue Aug 3, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.

Two stages are provided, one for dnf-3 and one for dnf5.
supakeen added a commit to supakeen/osbuild that referenced this issue Aug 4, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.

Two stages are provided, one for dnf-3 and one for dnf5.
supakeen added a commit to supakeen/osbuild that referenced this issue Aug 4, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.

Two stages are provided, one for dnf-3 and one for dnf5.
supakeen added a commit to supakeen/osbuild that referenced this issue Aug 11, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
supakeen added a commit to supakeen/osbuild that referenced this issue Aug 11, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
supakeen added a commit to supakeen/osbuild that referenced this issue Aug 11, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
supakeen added a commit to supakeen/osbuild that referenced this issue Aug 11, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
ondrejbudai pushed a commit to supakeen/osbuild that referenced this issue Aug 14, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See osbuild#455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
ondrejbudai pushed a commit that referenced this issue Aug 14, 2023
This adjustment allows the definition of the mark with the RPMs and runs
DNF after installing the RPMs to put the proper markings in the DNF
state database. See #455.

This ensures that packages don't get removed during `autoremove` leading
to broken systems.
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Building Fedora Aug 14, 2023
@supakeen supakeen moved this from ✅ Done to 🏗 In progress in Building Fedora Aug 15, 2023
@supakeen supakeen reopened this Aug 15, 2023
@supakeen supakeen assigned supakeen and unassigned supakeen Mar 6, 2024
@supakeen supakeen added the bug A description of a clear misbehavior of the application, which needs to be fixed. label Mar 6, 2024
@supakeen supakeen moved this from 🏗 In progress to 📋 Backlog in Building Fedora Mar 6, 2024
@supakeen
Copy link
Member

Putting @richm's post here in full since it's also related:

For our use case, we need to have an option to keep dnf/yum metadata when installing packages.
Ansible system roles integration testing uses Standard Test Interface https://docs.fedoraproject.org/en-US/ci/standard-test-interface/ - specifically - https://docs.fedoraproject.org/en-US/ci/standard-test-roles/ - this means (ideally) running every test playbook against a clean VM. But this leads to performance issues - while startup/teardown of the VM has gotten faster, the most time consuming aspect of a system roles run is the package download and installation, especially large packages, and things like kernel modules (looking at you storage kmod-vdo). We can use osbuild to build these images with the packages - but - every time the test goes to use the Ansible package module to install the packages, it checks the metadata, and has to rebuild the metadata, which takes so long as to defeat the original purpose.

So if we could build images with all of the packages and metadata, we could speed up our QE considerably.

@achilleas-k
Copy link
Member

achilleas-k commented Oct 18, 2024

We can use osbuild to build these images with the packages - but - every time the test goes to use the Ansible package module to install the packages, it checks the metadata, and has to rebuild the metadata, which takes so long as to defeat the original purpose.

This makes me wonder if it really is the same issue.

I'm not 100% clear on what's happening with the ansible module, so maybe @richm can check my assumptions here, but the way the issue is described makes me think that there's an ansible step that's doing (the equivalent of) dnf install <some packages> and the thing that's taking too long isn't the building of the package metadata but the repository metadata. As I understand it, the ansible module is run to install some packages and the hope is that the packages are preinstalled and no action should be required, but it first needs to download all the repo metadata.

For example, on a fresh system built by osbuild, running dnf list installed should take less than a second. Creating /var/lib/dnf/history.sqlite from local data isn't the issue here. The issue is that if you run dnf install <packages that are already installed>, it will take some time to download all the repo metadata before returning with "Nothing to do".

It sounds to me that even with the dnf metadata, without repo metadata (or with repo metadata older than the metadata_expire value of dnf.conf(5)), the problem will persist.

@richm
Copy link

richm commented Oct 18, 2024

As I understand it, the ansible module is run to install some packages and the hope is that the packages are preinstalled and no action should be required, but it first needs to download all the repo metadata.

I believe this is how the ansible module is working. Which means it isn't an Ansible or Ansible dnf module problem, it is a dnf install problem.

So is the issue that osbuild does not use the repo metadata at all during package installation? Or does it use the repo metadata, but removes it?

@supakeen
Copy link
Member

As I understand it, the ansible module is run to install some packages and the hope is that the packages are preinstalled and no action should be required, but it first needs to download all the repo metadata.

I believe this is how the ansible module is working. Which means it isn't an Ansible or Ansible dnf module problem, it is a dnf install problem.

So is the issue that osbuild does not use the repo metadata at all during package installation? Or does it use the repo metadata, but removes it?

It is currently not used at all. I'm working on some modularity related work recently which might mean I'll revisit the idea of having metadata available during and after build time again (and it might lead to this issues being solved).

@achilleas-k
Copy link
Member

So is the issue that osbuild does not use the repo metadata at all during package installation? Or does it use the repo metadata, but removes it?

Package fetching and installation happens outside the operating system tree that's being built, so repository metadata is only available on the host that's generating the manifest (as a side effect of the depsolve). It would be technically possible to seed the new system with repository metadata, but I think that would be a strange choice. It would essentially be pre-loading caches on a cold system, caches that have a relatively short expiration even, meaning you'd probably want to refresh them on first boot anyway.

@achilleas-k
Copy link
Member

The rpm repo metadata cache discussion is quite off topic from the original issue here however. Perhaps we should move this discussion back to the original issue if we want to keep talking about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A description of a clear misbehavior of the application, which needs to be fixed.
Projects
Status: 📋 Backlog
9 participants