Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Limit maximum size on disk of AOF files. Avoid disk full, long load times. #540

Open
fewtrell opened this issue May 23, 2024 · 0 comments · May be fixed by #1425
Open

[NEW] Limit maximum size on disk of AOF files. Avoid disk full, long load times. #540

fewtrell opened this issue May 23, 2024 · 0 comments · May be fixed by #1425

Comments

@fewtrell
Copy link
Contributor

The problem/use-case that the feature addresses

If AOF rewrite fails repeatedly the size of the *.aof files will grow unbounded. Rewrite can fail for a number of reasons, one of the more obivous ones is when memory pressure and write rate on the system is high.

If this continues the following undesirable effects surface:
1 - the time-to-load the larger *.aof files will increase any may exceed some system goal.
2 - (more critical) the disk partition may fill up. At this point valkey will start failing write commands - which normally would lower memory pressure an allow rewrite to succeed and this would recover the node. In this scenario there is no free disk space for rewrite to succeed so the node cannot recover

Description of the feature
(Note: I plan to submit the pull request, I have a working change in our internal repository).

Valkey already tracks server.aof_current_size

I propose to add a new configurable variable aof_max_size which acts like an artificial disk size. At the beginning of aofWrite() we can perform a check & return a ENOSPC error. In this scenario valkey will treat this situation as disk full & begin blocking new write commands. However no check is performed during aof rewrite (or other disk write funcions) so AOF rewrite can suceed & recover the node to usable once the disk space is freed.

Alternatives you've considered
We considered putting the *.aof files in their own partition but that would make administering valkey much more complex for just a small behavior change.
We considered having valkey check the free space on disk against some threshold. But that would be a more complex feature to build and execute. While it makes some sense for the avoiding disk full use case it is not so intuitive for the avoiding long load time use case.

Additional information

n/a

kronwerk added a commit to kronwerk/valkey that referenced this issue Dec 11, 2024
Signed-off-by: kronwerk <[email protected]>

improved aof-max-size tests

Signed-off-by: kronwerk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant