Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set compression level for zstd to level 10 #396

Conversation

dirkmueller
Copy link
Contributor

This requires a good amount of saving due to larger dictionary size with only marginal more time effort for compression. decompression time is unaffected.

This requires a good amount of saving due to larger dictionary
size with only marginal more time effort for compression. decompression
time is unaffected.
@Conan-Kudo Conan-Kudo merged commit b6720b0 into rpm-software-management:master Nov 8, 2023
3 checks passed
@dralley
Copy link
Contributor

dralley commented Nov 13, 2023

An 18% compute tradeoff for a ~1% improvement might make sense for repos which are frequently downloaded, but the tradeoff is questionable for repos that will only be used a couple dozen, hundred or perhaps thousand times.

@praiskup Do you have an opinion on this? Would you classify it as positive, negative or basically irrelevant for a service like COPR?

@praiskup
Copy link
Member

In Copr the "production vs. consumption" ratio is much higher compared to normal distributions, and the compression performance matters. So as long as we could opt-out I think it is a more or less neutral proposal, but without an option this is negative. I'm not sure what would be the effect of this after moving Copr repos to PULP.

@dirkmueller
Copy link
Contributor Author

@praiskup without further patches, overall createrepo_c is I/O bound, not cpu bound. the shift from 9 to 10 is merely 20% increase which is well in the noise taking I/O overhead into account. I've benchmarked this extensively for Open Build Service usecases which is similar to COPR, and it is neutral.

@praiskup
Copy link
Member

Copr's createrepo_c runs are (very often) CPU bound. We use --recycle-pkglist and similar options most of the time to minimize I/O when re-generating repositories.

@dirkmueller
Copy link
Contributor Author

@praiskup understood, we're doing the same. still profiling indicated that the compression only adds 30% of the total runtime. anyway, if you feel strongly, I can make it an option, and then I would even be able to go to higher compression levels.

@dirkmueller
Copy link
Contributor Author

@praiskup just let me know if an extra option for 5% runtime (20% of 30%) is needed, and I'll do a PR

@praiskup
Copy link
Member

I do not recall all the details from #191 now, though you might be right that 70% is XML dumping, and 30% compression. But I/O is often more or less zero (adding one package into a repository with 100k RPMs and more).

Having an option would be nice. I missed that this is already merged; then others probably think it is worth it, let's deal with the option later. We'll report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants