Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve the problem of pod OOM caused by increased buffer size due to inconsistent read and write speeds #3574

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mengkai514
Copy link

Description: The storage is not writing fast enough in most cases and the writes get backed up in memory, which caused pod OOM killed.

What this PR does / why we need it:
When I use DV to clone across storage classes (the image is very large, the storage is 2000Gi), after the upload is completed, kubectl get dv sees restart twice. And according to the log of cdi-upload pod, I found that if the image is raw, it will enter the ProcessingPhaseTransferDataFile(ProcessingPhaseConvert will not be executed) func to perform io.Copy, The problem should be that the cdi-upload pod OOMKilled during io.Copy. It would be the storage is too slow and data gets cached which causes memory usage to increase above the pod resource limit, which then causes an OOM on the server pod.

So there should be a method to write a certain amount of cache to disk.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Modify write method during io.Copy to prevent pod OOM.

Description: The storage is not writing fast enough in most cases and the writes get backed up in memory, which caused pod OOM killed.

Signed-off-by: mengkai <[email protected]>
@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Dec 19, 2024
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign awels for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot
Copy link
Contributor

Hi @mengkai514. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for posting the PR & issue 🙏

This is actually a very opinionated topic we've discussed many times before, most recently due to a kernel bug:
#3557 (comment)
And you can see some of back and forth through the years using this PR query:
https://github.com/kubevirt/containerized-data-importer/pulls?q=is%3Apr+writeback+is%3Aclosed
Basically, the conclusion of these is that we would prefer to leave this up to the OS, and avoid O_DIRECT.

BTW, it should not be needed with cgroupsv2 (which is the standard now);
Under cgroupsv2, the cache should get forced to disk much faster, 0.3 * 600M
in the case of CDI importer (dirty_ratio considered against container limit, not host)

I would like to poke more at your setup instead, specifically that kernel bug.
If we do come to the conclusion that the storage is "just slow" (this would make many fsyncs perform badly), maybe, in this case, the right thing would be to just configure higher memory limits for the CDI pods.

@mengkai514
Copy link
Author

@akalenyu The PR you listed seems to be a little different from my problem, because mine does not involve qemu-img convert. My problem occurs in the Reader and Writer during the Copy phase, so the fsync method you suggested may be able to solve it. However, adjusting higher limits memory can only solve the problem temporarily. If you want a long-term solution, I think it should be fsync and cgroupsv2, but now there are probably some that do not support cgroupsv2.

@akalenyu
Copy link
Collaborator

@akalenyu The PR you listed seems to be a little different from my problem, because mine does not involve qemu-img convert. My problem occurs in the Reader and Writer during the Copy phase, so the fsync method you suggested may be able to solve it. However, adjusting higher limits memory can only solve the problem temporarily. If you want a long-term solution, I think it should be fsync and cgroupsv2, but now there are probably some that do not support cgroupsv2.

cgroupsv1 is no longer supported. I would really prefer that we don't have custom flushing logic in the project and that we inherit whatever the OS knobs (dirty_ratio) give us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants