-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a note to conf about the dangers of modifying dir at runtime #887
Conversation
If a forked child process is running, modifying dir is not allowed. I can think of two problems at present: 1. If dir is modified during aof rewrite, we will leave the tmp files on the disk. Because the rename operation of the main process in backgroundRewriteDoneHandler will fail. Of course, we can stop aof rewrite when modifying dir, and restart aof rewrite after the modiry. 2. Both the parent and child processes may call serverLog to write logs, and modifying dir will cause the child process to write logs in the old dir and the parent process to write logs in the new dir. Because they fopen the file every time they call serverLog. (PS. we may need to change it to open only once in the future). Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also be okay making this config immutable for Valkey 8.0. We've had security issues in the past with it, which is why we marked it as PROTECTED
. It seems fine to throw these temporary errors here as well.
Co-authored-by: Madelyn Olson <[email protected]> Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #887 +/- ##
============================================
+ Coverage 70.82% 70.84% +0.01%
============================================
Files 118 118
Lines 63561 63561
============================================
+ Hits 45020 45029 +9
+ Misses 18541 18532 -9 |
I actually seem to prefer making it immutable now, WDYT? @valkey-io/core-team |
I am ok to change this config from MODIFIABLE_CONFIG | PROTECTED_CONFIG | DENY_LOADING_CONFIG to IMMUTABLE_CONFIG | DENY_LOADING_CONFIG |
I'm okay also making it immutable. |
+1 on making |
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Immutable is OK, as long as it works to set |
Signed-off-by: Binbin <[email protected]>
ok, i tested it manually and it's ok (behavior is the same as unstable) and i also added some tests hope we can cover it. Maybe someone else needs to test it manually to make sure i am not breaking anythings... |
I have encountered scenarios where the In such cases, users want to start a replica on another machine for migration. However, because the disk is read-only, a full synchronization cannot be performed (as RDB requires disk writes, which was not supported in earlier versions through diskless sync). In this situation, we recovered by modifying the This may not be a common case, but I am not sure if other users have other scenarios that require modifying the dir. |
Yes, this is indeed one of the uses of dir modification, i suddenly forgot about it. Speaking of read-only disk error, it will cause cluster-config-file saving to fail and cause the node to exit. Like the cluster is doing some cluster stuff, and the node update nodes.conf and fail and exit due to disk error. We once wanted to save nodes.conf somewhere else by changing dir, but in the end we gave up for some reasons (when system disk fails, it is not reliable to hang on the disk) and we are using a module to hack it. And btw, i would like to add a configuration item to control the killing itself thing, do you think it is necessary? I remember @PingXie seemed to have rejected it (but i think it is a configuration item that can be controlled by the administrator). Its suicide is passive and difficult to control.
|
Interesting, so maybe it's better to keep it mutable. Can we prevent changing it when we have a child process running? |
Interesting scenario, indeed. However, as @soloestoy mentioned, it's not a typical use case, and the concerns raised by @enjoy-binbin, such as temporary file leaks and split logs, are not a concern here IMO. I agree that keeping this config mutable makes sense but I see little value or need in hardening this code path. Instead, we should document it in the |
Right, |
Sorry @enjoy-binbin I completely lost the context on this one. Do you have a thread? |
ok, the link is #635, i take it wrong. |
ok, now there seem to be three options now:
vote? |
We have diskless replication now though, do we still need to worry about this case? |
yeah. although we can't say disk-based replication will never be used, diskless has been the default (since 7.0?). so this is even less of a concern IMO. I would vote for option 1. |
I look carefully which scenarios this parameter will be used again, there are 3 cases:
@soloestoy describe a typical use case, and maybe more cases are coming. And we have no idea how the clients behavior is, thus I think we should keep it mutable, but in valkey.conf we can highlight it is dangerous to change it during runtime. Thus, I vote 1. |
In Zhao's example, when the disk is full, it is now possible to replicate the data to another node. If disk-based sync is enabled, they can just change to diskless and replication will work. But if the disk is full and they just want to use another disk for AOF, RDB and nodes.conf, then mutable is more useful, maybe. Nodes.conf is not written by a fork, so this one is no problem. If it is risky to change the dir when a fork is running, then I think we can prevent it with option 2. But What is the worst thing that can happen if
Leaving temp files is maybe no disaster but it is a little annoying, if it is huge files. @enjoy-binbin also mentioned logging: the child process will continue logging in the old If it is only AOF files, then maybe we can do a similar trick as in #886? Allow |
I remember that the RDB files are ok, there is no crash and the RDB rename thing is ok (it is different than the AOF one). And i do remember it is only affect AOF files, yean i take try to store the dir and see if it can work. The logging one is not a real problem too i think, just a point that may be affected. |
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now this PR is completely changed and only adds some comments. It is a very small and safe change.
We've had security issues in the past with it, which is why
we marked it as PROTECTED. But, modifying during runtime
is also a dangerous action. For example, when child processes
are running, persistent temp files and log files may have
unexpected effects.
A scenario for modifying dir at runtime is to migrate a disk
failure, such as using disk-based replication to migrate a node,
writing nodes.conf to save the cluster configuration.
We decided to leave it as is and add a note in the conf
about the dangers of modifying dir at runtime.