Skip to main content
add a script
Source Link
derobert
  • 113.2k
  • 20
  • 242
  • 289

Script for all RAID devices on the system

A while back, I wrote this script to "repair" all RAID devices on the system. This was written for older kernel versions where only 'repair' would fix the bad sector; now just doing check is sufficient (repair still works fine on newer kernels, but it also re-copies/rebuilds parity, which isn't always what you want, especially on flash drives)

#!/bin/bash

save="$(tput sc)";
clear="$(tput rc)$(tput el)";
for sync in /sys/block/md*/md/sync_action; do
    md="$(echo "$sync" | cut -d/ -f4)"
    cmpl="/sys/block/$md/md/sync_completed"

    # check current state and get it repairing.
    read current < "$sync"
    case "$current" in
        idle)
            echo 'repair' > "$sync"
            true
            ;;
        repair)
            echo "WARNING: $md already repairing"
            ;;
        check)
            echo "WARNING: $md checking, aborting check and starting repair"
            echo 'idle' > "$sync"
            echo 'repair' > "$sync"
            ;;
        *)
            echo "ERROR: $md in unknown state $current. ABORT."
            exit 1
            ;;
    esac
    
    echo -n "Repair $md...$save" >&2
    read current < "$sync"
    while [ "$current" != "idle" ]; do
        read stat < "$cmpl"
        echo -n "$clear $stat" >&2
        sleep 1
        read current < "$sync"
    done
    echo "$clear done." >&2;
done

for dev in /dev/sd?; do
    echo "Starting offline data collection for $dev."
    smartctl -t offline "$dev"
done

If you want to do check instead of repair, then this (untested) first block should work:

    case "$current" in
        idle)
            echo 'check' > "$sync"
            true
            ;;
        repair|check)
            echo "NOTE: $md $current already in progress."
            ;;
        *)
            echo "ERROR: $md in unknown state $current. ABORT."
            exit 1
            ;;
    esac

Script for all RAID devices on the system

A while back, I wrote this script to "repair" all RAID devices on the system. This was written for older kernel versions where only 'repair' would fix the bad sector; now just doing check is sufficient (repair still works fine on newer kernels, but it also re-copies/rebuilds parity, which isn't always what you want, especially on flash drives)

#!/bin/bash

save="$(tput sc)";
clear="$(tput rc)$(tput el)";
for sync in /sys/block/md*/md/sync_action; do
    md="$(echo "$sync" | cut -d/ -f4)"
    cmpl="/sys/block/$md/md/sync_completed"

    # check current state and get it repairing.
    read current < "$sync"
    case "$current" in
        idle)
            echo 'repair' > "$sync"
            true
            ;;
        repair)
            echo "WARNING: $md already repairing"
            ;;
        check)
            echo "WARNING: $md checking, aborting check and starting repair"
            echo 'idle' > "$sync"
            echo 'repair' > "$sync"
            ;;
        *)
            echo "ERROR: $md in unknown state $current. ABORT."
            exit 1
            ;;
    esac
    
    echo -n "Repair $md...$save" >&2
    read current < "$sync"
    while [ "$current" != "idle" ]; do
        read stat < "$cmpl"
        echo -n "$clear $stat" >&2
        sleep 1
        read current < "$sync"
    done
    echo "$clear done." >&2;
done

for dev in /dev/sd?; do
    echo "Starting offline data collection for $dev."
    smartctl -t offline "$dev"
done

If you want to do check instead of repair, then this (untested) first block should work:

    case "$current" in
        idle)
            echo 'check' > "$sync"
            true
            ;;
        repair|check)
            echo "NOTE: $md $current already in progress."
            ;;
        *)
            echo "ERROR: $md in unknown state $current. ABORT."
            exit 1
            ;;
    esac
edited body
Source Link
derobert
  • 113.2k
  • 20
  • 242
  • 289

All these "poke the sector" answers are, quite frankly, insane. They risk (possibly hidden) filesystem corruption. If the data were already gone, because that disk stored the only copy, it'd be reasonable. But there is a perfectly good copy on the mirror.

You just need to have mdraid scrub the mirror. It'll notice the bad sector, and rewrite it automatically.

# echo 'check' > /sys/block/mdX/md/sync_action    # use 'repair' instead for older kernels

You need to put the right device in there (e.g., md0 instead of mdX). This will take a while, as it does the entire array by defaulydefault. On a new enough kernel, you can write sector numbers to sync_min/sync_max first, to limit it to only a portion of the array.

This is a safe operation. You can do it on all of your mdraid devices. In fact, you should do it on all your mdraid devices, regularly. Your distro likely ships with a cronjob to handle this, maybe you need to do something to enable it?

All these "poke the sector" answers are, quite frankly, insane. They risk (possibly hidden) filesystem corruption. If the data were already gone, because that disk stored the only copy, it'd be reasonable. But there is a perfectly good copy on the mirror.

You just need to have mdraid scrub the mirror. It'll notice the bad sector, and rewrite it automatically.

# echo 'check' > /sys/block/mdX/md/sync_action    # use 'repair' instead for older kernels

You need to put the right device in there (e.g., md0 instead of mdX). This will take a while, as it does the entire array by defauly. On a new enough kernel, you can write sector numbers to sync_min/sync_max first, to limit it to only a portion of the array.

This is a safe operation. You can do it on all of your mdraid devices. In fact, you should do it on all your mdraid devices, regularly. Your distro likely ships with a cronjob to handle this, maybe you need to do something to enable it?

All these "poke the sector" answers are, quite frankly, insane. They risk (possibly hidden) filesystem corruption. If the data were already gone, because that disk stored the only copy, it'd be reasonable. But there is a perfectly good copy on the mirror.

You just need to have mdraid scrub the mirror. It'll notice the bad sector, and rewrite it automatically.

# echo 'check' > /sys/block/mdX/md/sync_action    # use 'repair' instead for older kernels

You need to put the right device in there (e.g., md0 instead of mdX). This will take a while, as it does the entire array by default. On a new enough kernel, you can write sector numbers to sync_min/sync_max first, to limit it to only a portion of the array.

This is a safe operation. You can do it on all of your mdraid devices. In fact, you should do it on all your mdraid devices, regularly. Your distro likely ships with a cronjob to handle this, maybe you need to do something to enable it?

Source Link
derobert
  • 113.2k
  • 20
  • 242
  • 289

All these "poke the sector" answers are, quite frankly, insane. They risk (possibly hidden) filesystem corruption. If the data were already gone, because that disk stored the only copy, it'd be reasonable. But there is a perfectly good copy on the mirror.

You just need to have mdraid scrub the mirror. It'll notice the bad sector, and rewrite it automatically.

# echo 'check' > /sys/block/mdX/md/sync_action    # use 'repair' instead for older kernels

You need to put the right device in there (e.g., md0 instead of mdX). This will take a while, as it does the entire array by defauly. On a new enough kernel, you can write sector numbers to sync_min/sync_max first, to limit it to only a portion of the array.

This is a safe operation. You can do it on all of your mdraid devices. In fact, you should do it on all your mdraid devices, regularly. Your distro likely ships with a cronjob to handle this, maybe you need to do something to enable it?