Skip to main content
replaced https://tools.ietf.org/html/draft with https://datatracker.ietf.org/doc/html/draft
Source Link

This is not just limited to tmpfs but has been a concern cited with NFSv4NFSv4.

This is not just limited to tmpfs but has been a concern cited with NFSv4.

This is not just limited to tmpfs but has been a concern cited with NFSv4.

Bounty Awarded with 500 reputation awarded by frostschutz
Source Link
Brian Redbeard
  • 3.2k
  • 22
  • 45

First off you're not alone in puzzling about these sorts of issues.

This is not just limited to tmpfs but has been a concern cited with NFSv4.

If an application reads 'holes' in a sparse file, the file system converts empty blocks into "real" blocks filled with zeros, and returns them to the application.

When md5sum is attempting to scan a file it explicitly chooses to do this in sequential order, which makes a lot of sense based on what md5sum is attempting to do.

As there are fundamentally "holes" in the file, this sequential reading is going to (in some situations) cause a copy on write like operation to fill out the file. This then gets into a deeper issue around whether or not fallocate() as implemented in the filesystem supports FALLOC_FL_PUNCH_HOLE.

Fortunately, not only does tmpfs support this but there is a mechanism to "dig" the holes back out.

Using the CLI utility fallocate we can successfuly detect and re-dig these holes.

As per man 1 fallocate:

-d, --dig-holes
      Detect and dig holes.  This makes the file sparse in-place, without
      using extra disk space.  The minimum size of the hole depends on
      filesystem I/O  block size (usually 4096 bytes).  Also, when using
      this option, --keep-size is implied.  If no range is specified by
      --offset and --length, then the entire file is analyzed for holes.

      You can think of this option as doing a "cp --sparse" and then
      renaming the destination file to the original, without the need for
      extra disk space.

      See --punch-hole for a list of supported filesystems.

fallocate operates on the file level though and when you are running md5sum against a block device (requesting sequential reads) you're tripping up on the exact gap between how the fallocate() syscall should operate. We can see this in action:

In action, using your example we see the following:

$ fs=$(mktemp -d)
$ echo ${fs}
/tmp/tmp.ONTGAS8L06
$ dd if=/dev/zero of=${fs}/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls ${fs}/sparse100M -s)"
Before: 0 /tmp/tmp.ONTGAS8L06/sparse100M
$ sudo losetup /dev/loop0 ${fs}/sparse100M
$ sudo md5sum /dev/loop0
2f282b84e7e608d5852449ed940bfc51  /dev/loop0
$ echo "After:" "$(ls ${fs}/sparse100M -s)"
After: 102400 /tmp/tmp.ONTGAS8L06/sparse100M
$ fallocate -d ${fs}/sparse100M
$ echo "After:" "$(ls ${fs}/sparse100M -s)"
After: 0 /tmp/tmp.ONTGAS8L06/sparse100M

Now... that answers your basic question. My general motto is "get weird" so I dug in further...

$ fs=$(mktemp -d)
$ echo ${fs}
/tmp/tmp.ZcAxvW32GY
$ dd if=/dev/zero of=${fs}/sparse100M conv=sparse seek=$((100*2*1024-1)) count=1 2>/dev/null
$ echo "Before:" "$(ls ${fs}/sparse100M -s)"
Before: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo losetup /dev/loop0 ${fs}/sparse100M
$ echo "After:" "$(ls ${fs}/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum ${fs}/sparse100M
2f282b84e7e608d5852449ed940bfc51  /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls ${fs}/sparse100M -s)"
After: 1036 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d ${fs}/sparse100M
$ echo "After:" "$(ls ${fs}/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum ${fs}/sparse100M
2f282b84e7e608d5852449ed940bfc51  /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls ${fs}/sparse100M -s)"
After: 520 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d ${fs}/sparse100M
$ echo "After:" "$(ls ${fs}/sparse100M -s)"
After: 516 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d ${fs}/sparse100M
$ sudo md5sum ${fs}/sparse100M
2f282b84e7e608d5852449ed940bfc51  /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls ${fs}/sparse100M -s)"
After: 512 /tmp/tmp.ZcAxvW32GY/sparse100M
$ fallocate -d ${fs}/sparse100M
$ echo "After:" "$(ls ${fs}/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M
$ sudo md5sum ${fs}/sparse100M
2f282b84e7e608d5852449ed940bfc51  /tmp/tmp.ZcAxvW32GY/sparse100M
$ echo "After:" "$(ls ${fs}/sparse100M -s)"
After: 0 /tmp/tmp.ZcAxvW32GY/sparse100M

You see that merely the act of performing the losetup changes the size of the sparse file. So this becomes an interesting combination of where tmpfs, the HOLE_PUNCH mechanism, fallocate, and block devices intersect.