You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If AOF rewrite fails repeatedly the size of the *.aof files will grow unbounded. Rewrite can fail for a number of reasons, one of the more obivous ones is when memory pressure and write rate on the system is high.
If this continues the following undesirable effects surface:
1 - the time-to-load the larger *.aof files will increase any may exceed some system goal.
2 - (more critical) the disk partition may fill up. At this point valkey will start failing write commands - which normally would lower memory pressure an allow rewrite to succeed and this would recover the node. In this scenario there is no free disk space for rewrite to succeed so the node cannot recover
Description of the feature
(Note: I plan to submit the pull request, I have a working change in our internal repository).
Valkey already tracks server.aof_current_size
I propose to add a new configurable variable aof_max_size which acts like an artificial disk size. At the beginning of aofWrite() we can perform a check & return a ENOSPC error. In this scenario valkey will treat this situation as disk full & begin blocking new write commands. However no check is performed during aof rewrite (or other disk write funcions) so AOF rewrite can suceed & recover the node to usable once the disk space is freed.
Alternatives you've considered
We considered putting the *.aof files in their own partition but that would make administering valkey much more complex for just a small behavior change.
We considered having valkey check the free space on disk against some threshold. But that would be a more complex feature to build and execute. While it makes some sense for the avoiding disk full use case it is not so intuitive for the avoiding long load time use case.
Additional information
n/a
The text was updated successfully, but these errors were encountered:
kronwerk
added a commit
to kronwerk/valkey
that referenced
this issue
Dec 11, 2024
The problem/use-case that the feature addresses
If AOF rewrite fails repeatedly the size of the *.aof files will grow unbounded. Rewrite can fail for a number of reasons, one of the more obivous ones is when memory pressure and write rate on the system is high.
If this continues the following undesirable effects surface:
1 - the time-to-load the larger *.aof files will increase any may exceed some system goal.
2 - (more critical) the disk partition may fill up. At this point valkey will start failing write commands - which normally would lower memory pressure an allow rewrite to succeed and this would recover the node. In this scenario there is no free disk space for rewrite to succeed so the node cannot recover
Description of the feature
(Note: I plan to submit the pull request, I have a working change in our internal repository).
Valkey already tracks
server.aof_current_size
I propose to add a new configurable variable aof_max_size which acts like an artificial disk size. At the beginning of aofWrite() we can perform a check & return a ENOSPC error. In this scenario valkey will treat this situation as disk full & begin blocking new write commands. However no check is performed during aof rewrite (or other disk write funcions) so AOF rewrite can suceed & recover the node to usable once the disk space is freed.
Alternatives you've considered
We considered putting the *.aof files in their own partition but that would make administering valkey much more complex for just a small behavior change.
We considered having valkey check the free space on disk against some threshold. But that would be a more complex feature to build and execute. While it makes some sense for the avoiding disk full use case it is not so intuitive for the avoiding long load time use case.
Additional information
n/a
The text was updated successfully, but these errors were encountered: