0

If I defragment files on btrfs with the command

btrfs filesystem defrag --step 1G file

Everything is fine. A filefrag -v file clearly show, the extent count significantly decreased.

Things are very different if I deal with compressed files. First, filefrag gives a huge amount of extents:

Filesystem type is: 9123683e
File size of file is 85942272 (20982 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..      31:     607198..    607229:     32:             encoded
   1:       32..      63:     609302..    609333:     32:     607230: encoded
   2:       64..      95:     609314..    609345:     32:     609334: encoded
   3:       96..     127:     609326..    609357:     32:     609346: encoded
...
 648:    20928..   20959:     704298..    704329:     32:     704299: encoded
 649:    20960..   20981:     691987..    692008:     22:     704330: last,encoded,eof
file: 650 extents found

Second, if btrfs filesystem defragment commands return on the spot, without error report - and with an unchanged filefrag output.

My impression is that fragmentation of compressed files is not an issue on btrfs filesystems et all. However, my ears clearly show: yes it is an issue for me.

So, how to defragment on btrfs compressed file? How could I even see, are they even continuous, but not their encoded ( == compressed ) extents, instead their compressed blocks on the hdd?

1
  • Note: "It does not work" is also an answer. Commented May 12 at 14:23

2 Answers 2

2

As far as I know, this is "normal behavior".

Filesystem-level compression is not done at the level of whole file, because programs expect to read (and write) arbitrary locations instead of being forced to go from byte 0 each time – while compression algorithms work by keeping some kind of continuously updated state. To extract from the middle, you need to know the state of the decompressor which could be influenced by all of the previous contents.

So instead such filesystems apply compression to each chunk individually – e.g. the compression state starts anew every 32 kB, and so if a program wants to read from the middle of that file, the filesystem only needs to decompress the single 32 kB block rather than many megabytes preceding it; and when the file is written to, it only needs to update the 32 kB block and not everything following it.

(Compare to extracting a single file out of a large .tar.gz or a "solid" RAR/7z – the decompression has to start at the beginning of the .gz until it reaches the location of the file needed.)

That's what filefrag represents – each "compression boundary" shows up as a separate encoded fragment.

It doesn't really count as fragmentation, though, because the physical_offset column still indicates that all those fragments are stored more-or-less linearly (even overlapping, which is probably a Btrfs artifact), so even if it were a mechanical HDD, it wouldn't need to do much seeking-around.

(Fragmentation is an issue on Btrfs, extremely so; but it's not an issue on SSDs, where the performance hit is usually not noticeable.)

4
  • Afaik it is still an issue even on SSDs, but only in a much lesser extent: 1) block device-level readahead still goes badly if it does not know about the fs 2) ssd can only trim and reflash blocks, and all the blocks has a wearing count (maybe 100000 or so). If the FS blocks are smaller as the ssd blocks and there is an unthinkable fragmentation, it will result more wearing (as the average count of block-level trims will be higher for fs-level block writes.) Commented May 12 at 18:19
  • I use btrfs on hdd and it defrags well. After defrag, it is fine. Actually, meanwhile I did a full fs - defrag (note: online, while my productive processes have runned on it!), it has shown a progress and this time it worked a lot on the newly created files. So, btrfs on hdd is not so bad et all, we only need to know, how to handle it :-) That online shrink support, online defrag support, these were unthinkable... Commented May 12 at 18:22
  • 1
    I take it the “32kB” reference in your answer is generic, and is fine as-is — on Btrfs, compression uses 128KiB extents (see the compression documentation). Commented May 13 at 6:59
  • It's generic, from the unitless "32" in OP's output. NTFS compression uses 64k. I have no idea what ZFS uses. Commented May 13 at 7:00
0

Here are my partial findings.

I did a recursive defragmentation (note: online, while my prod processes have runned on the system!). It also had a progress, it has shown, with which files is it working.

This time, it worked a lot on the compressed files, too. What makes, imho, likely, that something has prevented the single-file defrag, but allowed the full-filesystem recursive one.

As also @Grawity writes, the filefrag tool is not even a btrfs tool. It can not see anything below the linux filesystem interface. So it is nothing unusual that it can show only the highest, compressed-block level.

Either there is no way to see the lower, physical block-level setup of the compressed files, or if there is, that must be a btrfs-related tool and not filefrag. If someone can say it, I will happily give a pipe to him (yes, that would be the solution).

The most likely cause of the failure of the single-file defrag was, that the defrag tool had needed infos about the block allocation map of the partition, what it could collect only in a recursive, full-fs defragmentation.

Now my fs is faster and my HDD is silent, even working on the compressed files.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.