1

I received a warning during bootup that a hard drive has failed.

My first thought was it's probably one of the two older HDDs I have configured as a software RAID1 array.

So as soon as the boot finished I opened a terminal window and checked /proc/mdstat — sure enough it's only showing one drive.

But when I opened the "Disks" GUI utility (on Linux Mint - MATE) to find out which drive had failed, it reports both RAID members are OK and healthy.

When I run mdadm --examine on one of the drives it seems to indicate that both drives are active in the array, but when I run mdadm --examine on the other drive it reports the array as "A." (ie, one drive active one drive missing).

What's extra odd is /proc/mdstat only shows /dev/sdc1 so my assumption is /dev/sdb1 has failed(?), but when I run --examine on each drive, the array shows as "AA" (both drives active) when I examine /dev/sdb1 and it shows as "A." (one drive missing) when I examine /dev/sdc1 — so that makes me wonder if maybe /dev/sdb1 is the one that's failed?

Or am I just misinterpreting this output altogether? Is the array actually perfectly fine with two active drives, and just only showing one drive in /prod/mdstat for some reason?

Here's the output from /proc/mdstat and from --examine:

me@myhost:~$ cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdc1[0]
      976630336 blocks super 1.2 [2/1] [U_]

unused devices: <none>

me@myhost:~$ sudo mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 77c520b4:61c7fbfc:c6747e05:39d9497e
           Name : myhost:0
  Creation Time : Tue Apr 26 15:28:37 2016
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 1953260976 sectors (931.39 GiB 1000.07 GB)
     Array Size : 976630336 KiB (931.39 GiB 1000.07 GB)
  Used Dev Size : 1953260672 sectors (931.39 GiB 1000.07 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=261864 sectors, after=304 sectors
          State : clean
    Device UUID : c2a9f4c7:a5d1e0a1:b194b9b5:f4f5e15d

    Update Time : Tue Feb 21 20:14:46 2023
  Bad Block Log : 512 entries available at offset 264 sectors
       Checksum : a5274f16 - correct
         Events : 2622


   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

me@myhost:~$ sudo mdadm --examine /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 77c520b4:61c7fbfc:c6747e05:39d9497e
           Name : myhost:0
  Creation Time : Tue Apr 26 15:28:37 2016
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 1953260976 sectors (931.39 GiB 1000.07 GB)
     Array Size : 976630336 KiB (931.39 GiB 1000.07 GB)
  Used Dev Size : 1953260672 sectors (931.39 GiB 1000.07 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=304 sectors
          State : clean
    Device UUID : 9a573ffb:6749773a:966affd9:f3415a64

    Update Time : Sun Jul  7 14:23:08 2024
       Checksum : 9e59e6d9 - correct
         Events : 322313


   Device Role : Active device 0
   Array State : A. ('A' == active, '.' == missing, 'R' == replacing)

me@myhost:~$ sudo mdadm --examine --scan
ARRAY /dev/md/0  metadata=1.2 UUID=77c520b4:61c7fbfc:c6747e05:39d9497e name=myhost:0

2 Answers 2

1

/dev/sdb1 data is incorrect and is over a year out of date.

You can see this by the Update time

    Update Time : Tue Feb 21 20:14:46 2023
3
  • Thanks, so just to confirm — that seems to indicate /dev/sdb has failed? Commented Jul 9, 2024 at 3:28
  • @1337ingDisorder failed or kicked from the array for some other reason (cable problem, timeout, etc.). If you have system logs... from that long ago... might be worth checking. Commented Jul 9, 2024 at 11:57
  • Might also be worth doing a smartctl -a /dev/sdb and seeing if anything looks bad (eg reallocated sectors, or errors in the log). Commented Jul 9, 2024 at 14:24
1

Unfortunately mdadm does not update metadata for kicked drives anymore. That's how you get this confusing situation where mdadm --examine of the bad drive somehow looks good...

Even mdadm does not know if a drive failed before. If there is no other drive that declares it as missing in its own metadata; it might start an array off a previously failed drive. And suddenly your filesystem travels back in time.

When looking at mdadm --examine you have to consider output as a whole across all members; not just a single one. Check Update Time and Events and Array State as shown by other drives.

Since parsing mdadm --examine output of multiple drives is difficult, you can kind of cheat and use mdadm --examine /dev/... | sort and see more directly if Array UUID, Update Time, State and other values match across drives.

When a drive fails you are supposed to react in a timely manner. If you are running mdadm monitor and have a mail system, mdadm should have sent you failure notification mails as well.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.