-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assess the risk of data loss or other side effects of setting start_dirty_degraded flag #154
Comments
In normal situation you should repair your failing raid volume - likely by adding a new disk to repair the failed one in your RAID5 array. Follow the 'man lvmraid' for info about how to repair raid volume. If the repair of raid volume is failing - provide full log of your command and kernel message - with version of your kernel and tooling. |
@zkabelac i have same question. What is the probability of data corruption? Why does this message pop up at all?
|
@zkabelac , thank you. I've read I performed a little bit more investigation. Initially I have RAID5 array made of 5 disks and a thin provisioned volume. Once actual size overcomes 85% I call:
Adding another 10GB. Where Also tried to add
Here is an example report of
Regardless, of whether I add While synchronization is in progress, I eject one of the physical disks. Here is an example kernel log right before disk ejection and up to In a few seconds after disk is lost following commands are called to replace missing disk with spare one:
Also I made few runs with additional debug messages in
So, I assume, that degraded disk is my newly added one (spare). And there is no I doubt if this logic works as expected, because there is no chance for newly added disk not to be degraded. Also, I suspect, but not sure, that If this is true, then there is no way to replace faulty disk, if fault happen during thin provisioned volume extension, before synchronisation is over. But under heavy write load, synchronisation is almost permanent. Tested this with kernel
|
There is a
md_mod
kernel module parameter, calledstart_dirty_degraded
. It's documented mostly as a way to start dirty degraded bootable RAID array, and to be used as kernel command-line parameter.Yet, I have another scenario - there is RAID 5 pool with Thin Provisioned volume. Volume is iteratively extending using
lvextend
as new data arrives. Each volume extension causes pool to synchronise data. If I eject a physical disk, during synchronisation process and startlvconvert -y --repair
, I expect volume to be accessible as usual during rebuild. In fact I face messages from md in kernel log:Another
lvextend
operation fails with:Finally volume comes to
out_of_data
state, as it fails to extend further.Setting
start_dirty_degraded
MD parameter helps to avoidlvextend
failure and there is no any detected issues to continue writing data to thin volume, during rebuild process.In kernel log I see:
And that's the main issue - I'm curious how to estimate probability of data loss, when I use this flag?
Also, is there any other solution, that wouldn't lead to possible data corruption?
Best regards.
The text was updated successfully, but these errors were encountered: