Is it possible to automate the migration of data off a physical disk in an lvm group if an impending failure is detected?

Question

I'm building a non-critical local samba share out of a bunch of hard drives I have lying around.

I'm thinking of using LVM to group them into a single logical volume because LVM allows a lot of flexibility in modifying the underlying disk array

for example, if I want to replace a physical disk; I can tell the array to move data off that disk. I can then disconnect the disk, add a new disk to the volume group and extend the filesystem to include it.

I was wondering if it was possible to automatically detect an impending drive failure (SSD and HDD) from SMART information where, if the system understands that a drive is beyond a certain health threshold, it automatically moves data off of that physical disk (using pvmove, assuming there is room on another disk/s) and ejects it from the volume group.

Obviously not viable for mission critical production systems - but in the case of a home lab - it would be helpful to avoid the loss of the data on all the drives while providing something of an alternative to parity-based redundancy systems where they are not possible (like in my case where I have a mixture of SSDs, HDDs of varying sizes)

I'd imagine this is possible but does something like this already exist?

frostschutz · Accepted Answer · 2024-07-27 09:27:55Z

pvmove is great but it expects drives to work 100%.

If there is a read / write error, it is only indirectly visible, through dmesg:

# read error:
device-mapper: raid1: Unable to read primary mirror during recovery
# write error:
device-mapper: raid1: Write error during recovery (error = 0x1)

In both cases, pvmove reports success anyway, but data is missing. Effectively this results in "silent data corruption".

Of course, if you use ddrescue instead, the end result might be similar. But at least with ddrescue, you get a proper report as to how much could be recovered and the byterange offsets of the missing data (in the mapfile).

And sometimes, re-reads are successful. This is especially the case when read errors are intermittent, like cable or controller problems.

So, if you know you have a problematic drive. Just go with ddrescue directly.

For the automation aspect, smartd can execute customized scripts so you could set up some rules as per your preferences. But there are plenty of errors that don't even show up in SMART, so I'm not sure how useful it would be.

For reacting to read/write errors directly, I think it's only possible with mdadm raid and similar. If you do this indirectly (by watching smart, dmesg, etc.) there'll always be some data loss involved.

Stack Exchange Network

Is it possible to automate the migration of data off a physical disk in an lvm group if an impending failure is detected?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Is it possible to automate the migration of data off a physical disk in an lvm group if an impending failure is detected?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions