TL;DR: due to a long story, I got a Btrfs RAID1 filesystem comprised of devices /dev/sde1 and /dev/sde1 (IDs 1 and 2). Btrfs will not mount rw, saying one device is missing. How do I figure out which of these devices is the working one so I can remove the other, and how do I add the correct second drive (/dev/sdb1)? Currently, adding a drive fails because I can only mount read-only.
I have two external hard drives with Btrfs in RAID1 (mirror). Drive A is fine, but drive B got millions of errors during a regular scrub. While testing, device B seems fine, so I guess they just got out of sync (the host is a laptop and can survive a power outage due to its battery, but the drives cannot, so I guess one came online before the other or something). I wanted to rebuild the mirror from device A on device B.
After some searching, I figured the replace subcommand was the thing to use. I want to replace device B with, yup, device B. Naturally, I tried whether btrfs understands this command:
btrfs replace start /dev/deviceB /dev/deviceB /mountpoint
Unfortunately that didn't work. The man page says: "On a live filesystem, [start] duplicate[s] the data to the target device which is currently stored on the source device." So I just passed the other available device instead, because it can duplicate from there:
btrfs replace start /dev/deviceA /dev/deviceB /mountpoint
I should have read the man page better, because later on it says "After completion of the operation, the source device is removed from the filesystem." So now I have a filesystem with only /dev/deviceB in it.
But it never removed the original (corrupt) device B.
So now I have this situation:
$ btrfs device usage /mountpoint
/dev/sde1, ID: 1
Device size: 3.64TiB
Device slack: 0.00B
Data,single: 1.00GiB
Data,RAID1: 2.00TiB
Data,DUP: 40.91GiB
Metadata,single: 1.00GiB
Metadata,RAID1: 5.00GiB
Metadata,DUP: 3.00GiB
System,single: 32.00MiB
System,RAID1: 32.00MiB
System,DUP: 128.00MiB
Unallocated: 1.59TiB
/dev/sde1, ID: 2
Device size: 3.64TiB
Device slack: 0.00B
Data,RAID1: 2.00TiB
Metadata,RAID1: 5.00GiB
System,RAID1: 32.00MiB
Unallocated: 1.63TiB
(Where /dev/sde1 is device B. I am able to mount it with -o degraded,ro.)
How should I resolve this situation?
I tried adding device A (sdb1) but that fails, saying "ERROR: error adding device '/dev/sdb1': Read-only file system". I am not sure how to proceed, as I cannot tell which device ID is which, so removing either (in order to let it mount rw) might be catastrophic. And I'm not sure removing a device is the best course of action at this point anyway. Perhaps I should (after figuring out which device ID it is) use replace with a device ID as argument instead?
The filesystem on device A is no longer recognized as Btrfs, and when inspecting it with a hexdumper, it indeed seems invalid: it used to contain the literal string BTRFS somewhere near the beginning (iirc just after 0x10 000) but no longer does. The data still seems to be there, just not the correct header (first non-zero data is now at 0x400 000).