Revisions to Linux - Repairing bad blocks on a RAID1 array with GPT

add a script

Source Link

edited Jun 16, 2014 at 16:21

113.2k
20
242
289

Script for all RAID devices on the system

A while back, I wrote this script to "repair" all RAID devices on the system. This was written for older kernel versions where only 'repair' would fix the bad sector; now just doing check is sufficient (repair still works fine on newer kernels, but it also re-copies/rebuilds parity, which isn't always what you want, especially on flash drives)

#!/bin/bash

save="$(tput sc)";
clear="$(tput rc)$(tput el)";
for sync in /sys/block/md*/md/sync_action; do
    md="$(echo "$sync" | cut -d/ -f4)"
    cmpl="/sys/block/$md/md/sync_completed"

    # check current state and get it repairing.
    read current < "$sync"
    case "$current" in
        idle)
            echo 'repair' > "$sync"
            true
            ;;
        repair)
            echo "WARNING: $md already repairing"
            ;;
        check)
            echo "WARNING: $md checking, aborting check and starting repair"
            echo 'idle' > "$sync"
            echo 'repair' > "$sync"
            ;;
        *)
            echo "ERROR: $md in unknown state $current. ABORT."
            exit 1
            ;;
    esac
    
    echo -n "Repair $md...$save" >&2
    read current < "$sync"
    while [ "$current" != "idle" ]; do
        read stat < "$cmpl"
        echo -n "$clear $stat" >&2
        sleep 1
        read current < "$sync"
    done
    echo "$clear done." >&2;
done

for dev in /dev/sd?; do
    echo "Starting offline data collection for $dev."
    smartctl -t offline "$dev"
done

If you want to do check instead of repair, then this (untested) first block should work:

    case "$current" in
        idle)
            echo 'check' > "$sync"
            true
            ;;
        repair|check)
            echo "NOTE: $md $current already in progress."
            ;;
        *)
            echo "ERROR: $md in unknown state $current. ABORT."
            exit 1
            ;;
    esac

Script for all RAID devices on the system

A while back, I wrote this script to "repair" all RAID devices on the system. This was written for older kernel versions where only 'repair' would fix the bad sector; now just doing check is sufficient (repair still works fine on newer kernels, but it also re-copies/rebuilds parity, which isn't always what you want, especially on flash drives)

#!/bin/bash

save="$(tput sc)";
clear="$(tput rc)$(tput el)";
for sync in /sys/block/md*/md/sync_action; do
    md="$(echo "$sync" | cut -d/ -f4)"
    cmpl="/sys/block/$md/md/sync_completed"

    # check current state and get it repairing.
    read current < "$sync"
    case "$current" in
        idle)
            echo 'repair' > "$sync"
            true
            ;;
        repair)
            echo "WARNING: $md already repairing"
            ;;
        check)
            echo "WARNING: $md checking, aborting check and starting repair"
            echo 'idle' > "$sync"
            echo 'repair' > "$sync"
            ;;
        *)
            echo "ERROR: $md in unknown state $current. ABORT."
            exit 1
            ;;
    esac
    
    echo -n "Repair $md...$save" >&2
    read current < "$sync"
    while [ "$current" != "idle" ]; do
        read stat < "$cmpl"
        echo -n "$clear $stat" >&2
        sleep 1
        read current < "$sync"
    done
    echo "$clear done." >&2;
done

for dev in /dev/sd?; do
    echo "Starting offline data collection for $dev."
    smartctl -t offline "$dev"
done

If you want to do check instead of repair, then this (untested) first block should work:

    case "$current" in
        idle)
            echo 'check' > "$sync"
            true
            ;;
        repair|check)
            echo "NOTE: $md $current already in progress."
            ;;
        *)
            echo "ERROR: $md in unknown state $current. ABORT."
            exit 1
            ;;
    esac

edited body

Source Link

edited Sep 10, 2012 at 16:21

derobert

113.2k
20
242
289

All these "poke the sector" answers are, quite frankly, insane. They risk (possibly hidden) filesystem corruption. If the data were already gone, because that disk stored the only copy, it'd be reasonable. But there is a perfectly good copy on the mirror.

You just need to have mdraid scrub the mirror. It'll notice the bad sector, and rewrite it automatically.

# echo 'check' > /sys/block/mdX/md/sync_action    # use 'repair' instead for older kernels

You need to put the right device in there (e.g., md0 instead of mdX). This will take a while, as it does the entire array by defaulydefault. On a new enough kernel, you can write sector numbers to sync_min/sync_max first, to limit it to only a portion of the array.

This is a safe operation. You can do it on all of your mdraid devices. In fact, you should do it on all your mdraid devices, regularly. Your distro likely ships with a cronjob to handle this, maybe you need to do something to enable it?

All these "poke the sector" answers are, quite frankly, insane. They risk (possibly hidden) filesystem corruption. If the data were already gone, because that disk stored the only copy, it'd be reasonable. But there is a perfectly good copy on the mirror.

You just need to have mdraid scrub the mirror. It'll notice the bad sector, and rewrite it automatically.

# echo 'check' > /sys/block/mdX/md/sync_action    # use 'repair' instead for older kernels

You need to put the right device in there (e.g., md0 instead of mdX). This will take a while, as it does the entire array by defauly. On a new enough kernel, you can write sector numbers to sync_min/sync_max first, to limit it to only a portion of the array.

This is a safe operation. You can do it on all of your mdraid devices. In fact, you should do it on all your mdraid devices, regularly. Your distro likely ships with a cronjob to handle this, maybe you need to do something to enable it?

All these "poke the sector" answers are, quite frankly, insane. They risk (possibly hidden) filesystem corruption. If the data were already gone, because that disk stored the only copy, it'd be reasonable. But there is a perfectly good copy on the mirror.

You just need to have mdraid scrub the mirror. It'll notice the bad sector, and rewrite it automatically.

# echo 'check' > /sys/block/mdX/md/sync_action    # use 'repair' instead for older kernels

You need to put the right device in there (e.g., md0 instead of mdX). This will take a while, as it does the entire array by default. On a new enough kernel, you can write sector numbers to sync_min/sync_max first, to limit it to only a portion of the array.

This is a safe operation. You can do it on all of your mdraid devices. In fact, you should do it on all your mdraid devices, regularly. Your distro likely ships with a cronjob to handle this, maybe you need to do something to enable it?

Source Link

answered Sep 6, 2012 at 15:18

derobert

113.2k
20
242
289

All these "poke the sector" answers are, quite frankly, insane. They risk (possibly hidden) filesystem corruption. If the data were already gone, because that disk stored the only copy, it'd be reasonable. But there is a perfectly good copy on the mirror.

You just need to have mdraid scrub the mirror. It'll notice the bad sector, and rewrite it automatically.

# echo 'check' > /sys/block/mdX/md/sync_action    # use 'repair' instead for older kernels

You need to put the right device in there (e.g., md0 instead of mdX). This will take a while, as it does the entire array by defauly. On a new enough kernel, you can write sector numbers to sync_min/sync_max first, to limit it to only a portion of the array.

This is a safe operation. You can do it on all of your mdraid devices. In fact, you should do it on all your mdraid devices, regularly. Your distro likely ships with a cronjob to handle this, maybe you need to do something to enable it?

Stack Exchange Network

Return to Answer

Script for all RAID devices on the system

Script for all RAID devices on the system