Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a disk space check in pre-upgrade #1279

Closed
lucamosca1 opened this issue Aug 8, 2024 · 3 comments
Closed

Add a disk space check in pre-upgrade #1279

lucamosca1 opened this issue Aug 8, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@lucamosca1
Copy link

It's common practice to partition servers to optimize and safeguarding the system. It could happens that, using cloud providers, disks are very small to save costs. In my case root partition is large 10GB on every server. In one of them, upgrade failed since there wasn't enough space to download and install all required packages.
Maybe a pre-upgrade could check root partition to see if there are at least X GB for a standard system and provide a warning (I would avoid block the process, that choice should be up to admin.

@lucamosca1 lucamosca1 added the bug Something isn't working label Aug 8, 2024
@pirat89
Copy link
Member

pirat89 commented Aug 13, 2024

HI @lucamosca1, can you share following data?

  • rpm -qa | grep leapp
  • used OS (and version)
  • used cloud?

Also it would help to have following df -h outputs:

  • before installing leapp-upgrade at all (so /var/lib/leapp is empty),
  • after running leapp preupgrade
  • after running leapp upgrade but before reboot
  • actual error output you hit

and /var/log/leapp/leapp-upgrade.logs


The checks you speak about are actually implemented for some time. We have tested it on various setups and various amount of space - also playing with edge-cases where the amount of free space is very close to real limits when the upgrade starts to fail - testing of typical systems in clouds too. Note that with the implemented checks, it's hard to do some testing to get to a "real" limits as part of the solution are also some reserves, so the upgrade is usually inhibited before you could get very close to real limits.

We would raise these limits much more to be really safer - that means, by another several GBs on top of what is set for reserves right now. However, we know it would be problematic as number of systems can upgrade and such a raise of space would be understood negatively - as basically majority of cloud systems would be blocked to upgrade when it's not necessary. So we do not have too much space to raise required reserves more significantly. Note that the required free disk space is very dynamic. Since we implemented the current checks, reports about this issue are extremely rare. We are aware that 100% safe solution is basically not possible/feasible. So we implemented these best effort checks.

To be able to improve the solution even more, we would need to have outputs from rpm about calculated required disk space per each partition. As rpm is not providing such an output, we cannot make our estimations more optimised. We have discussed this with RPM developers and they consider such a feature problematic due to RPM design. There are some tricky hacks we could do to obtain such information, but right now it's not considered as a good trade-off as it's trickey (basically we would have to make rpm to think that all partitions are too small for anything and get the information from printed errors). So consider this as something we do not want even inside the code right now at all.

Then there is one another problem, as the disk space we check can be still consumed by various logs, apps, etc. before the reboot is executed, etc. We have set some reserves to cover that, but in some corner cases it can happen this reserve will not be enough again.

I hope I provided enough details to understand the actual problems when dealing with required free disk space, so it's understood that this will never be completely error-proof as there are number of heuristics used (even rpm is using number of heuristics to realize how much space will be needed for the transaction and it's not 100% safe neither). We can only improve heuristics but that's all. If people want to be safe with such operation, having at least 5times more free space than is needed for the installed SW per partition can be considered kind of safe - but not in all cases again.

@lucamosca1
Copy link
Author

Ok, observation perfectly understandable. Unluckily I don't have anymore the server involved (it was a CentOS 7 installed on an AWS EC2 instance) so I cannot share additional info.
My was an observation to determine an arbitrary X GB of free disk space on root paritition that raise a warning for the upgrade before proceeding. I didn't took in consideration the total and exact amount of space require by all packages that should be downloaded.

@pirat89 pirat89 transferred this issue from oamg/leapp Aug 13, 2024
@pirat89
Copy link
Member

pirat89 commented Aug 13, 2024

Thanks for the info. In such a case I think you have used the Elevate project from almalinux, which uses own forked of leapp-repository. I am aware that they used for a longer time an older leapp-repository version which has been missing number of fixes and features (this one included). It seems that they have updated the fork recently to version that should contain the changes - speaking about upstream version 0.19.0 and higher.

The original solution has been very bugy (providing misleading error messages as well) due to hacks we had to do to cover older XFS FSs without ftype (d_type) attributes. The solution had complete redesign done by this 2 PRs:

So I am considering this problem is nowadays resolved.

@pirat89 pirat89 closed this as completed Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants