7

I have a btrfs file system that I've scrubbed a few times and found some errors. If I check the device status I see that I have generation_errs:

$ sudo btrfs device stats /dev/nvme0n1p5
[/dev/nvme0n1p5].write_io_errs   0
[/dev/nvme0n1p5].read_io_errs    0
[/dev/nvme0n1p5].flush_io_errs   0
[/dev/nvme0n1p5].corruption_errs 0
[/dev/nvme0n1p5].generation_errs 3

What are these generation_errs? Is it something to be worried about?

FWIW, here are the error messages found in the kernel log after a scrub:

5/22/16 12:00 AM BTRFS warning (device nvme0n1p5): checksum/header error at logical 343949312 on dev /dev/nvme0n1p5, sector 671776: metadata leaf (level 0) in tree 321435615232
5/22/16 12:00 AM BTRFS warning (device nvme0n1p5): checksum/header error at logical 343949312 on dev /dev/nvme0n1p5, sector 671776: metadata leaf (level 0) in tree 321376649216
5/22/16 12:00 AM BTRFS warning (device nvme0n1p5): checksum/header error at logical 343949312 on dev /dev/nvme0n1p5, sector 671776: metadata leaf (level 0) in tree 109330432
5/22/16 12:00 AM BTRFS warning (device nvme0n1p5): checksum/header error at logical 343949312 on dev /dev/nvme0n1p5, sector 671776: metadata leaf (level 0) in tree 109330432
5/22/16 12:00 AM BTRFS error (device nvme0n1p5): bdev /dev/nvme0n1p5 errs: wr 0, rd 0, flush 0, corrupt 0, gen 2
5/22/16 12:00 AM BTRFS error (device nvme0n1p5): unable to fixup (regular) error at logical 343949312 on dev /dev/nvme0n1p5
0

2 Answers 2

6
+50

The btrfs Wiki glossary says this about what a generation is:

  • generation

    An internal counter which updates for each transaction. When a metadata block is written (using copy on write), current generation is stored in the block, so that blocks which are too new (and hence possibly inconsistent) can be identified.

Another entry mentions that

Under normal circumstances the generation numbers must match. A mismatch can be caused by a lost write after a crash (ie. a dangling block "pointer"; software bug, hardware bug), misdirected write (the block was never written to that location; software bug, hardware bug).

It doesn't tell me very much, and others are asking too, see for example this mail thread.

Actually, the last post in that thread is mentioning something half useful: A "generation error" is "an indication that blocks have not been written", which is basically echoing what the Wiki says.

So, with that information we can draw some conclusions:

  • The btrfs filesystem is not fully documented (user-side) with explanation of the output from its tools (the Wiki even says "For now, most of the information exists in people's heads.")

  • There was a few errors writing meta information to disk, which, yes, could indicate a problem.

By answering this question, I hope that some btfs guru pops up and gives you a proper answer to the question "What do I do about it?".

Your next port of call may be asking on an btrfs mailing list, such as the one mentioned in the Wiki (I would do this now if I were you).

2

I'm no expert, but I had a similar report from btrfs dev stats -c following a scheduled monthly scrub.

4000 corruption_errs on sda
14 generation_errs on sdd

As stated in the btrfs documentation,

corruption_errs A block checksum mismatched or a corrupted metadata header was found.

generation_errs The block generation does not match the expected value (eg. stored in the parent node).

I run a raid 10 so I thought scrubbing was supposed to fix such checksum errors by replacing these bad headers with healthy ones from another copy, but scrub didn't do this (or at least it doesn't automatically reset the error counters if it does actually fix them. A little confusing IMO).

Anyway, I decided to run a balance instead, and then scrubbed the disks. This did replace the corrupted checksums with healthy ones and the errors are gone.

  • Is a balance recommended, or required? IDK.
  • Could running a balance while corruption errors exist have possibly spread the corrupted headers to other disks? IDK.
  • Did it fix my issue? Yes, YMMV.
1
  • As far as I am aware scrubbing does not actually reset the error counts so you will see the error count continuously increase as the scrub progresses. If you want to reset the error counts use the -z flag to btrfs de stats (this may be worth doing after every scrub). A balance should not be required to fix corruption errors. Commented Sep 8, 2020 at 1:34

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.