zfs send performance

Question

I'm having an odd performance issue when trying to backup my zfs filesystems.

I can tar the contents of a zfs filesystem at 100+ MB/s, but zfs send trickles the data at maybe 5 MB/s. The filesystem only has 5 or 6 snapshots. A tar takes about 1.5 hours. A zfs send takes more than 12 hours!

In both cases, the destination is a file on another pool. (i.e. zfs send tank/myfs > /backup/myfs.zfsbackup vs. tar -cf /backup/myfs.tar ./myfs)

My first thought was fragmentation, but if that was the case, wouldn't tar be just as slow?

I'm getting decent enough overall disk performance, but my backups are literally taking forever.

I'm running Solaris 11.4 on x64 hardware. Conceptually, the issue may be similar to zfs on Linux, but I'm not that familiar with the Linux variant.

I ran the dtrace script provided in the answer below for approx 12 minutes while a zfs send was running.

dtrace -i 'profile:::profile-1001hz /arg0/ { @[ stack() ] = count(); }'

I'm not sure how to interpret the results. There were two sections of summary that contained a good number of zfs calls:

      zfs`zfs_fletcher_4_native+0x79
      zfs`zfs_checksum_compute+0x181
      zfs`zio_checksum_compute+0x1d6
      zfs`zio_checksum_compute_dispatch+0x28
      zfs`zio_checksum_generate+0x59
      zfs`zio_execute+0xb4
      genunix`taskq_thread+0x3d5
      unix`thread_start+0x8
     1041

      unix`bcopy+0x55a
      genunix`uiomove+0xb3
      zfs`dmu_xuio_transform+0x83
      zfs`zfs_write+0x78a
      genunix`fop_write+0xf5
      genunix`vn_rdwr_impl+0x1f3
      genunix`vn_rdwr_uiov+0x63
      zfs`dump_buffer_flush+0x8e
      zfs`dump_buffer_append+0x85
      zfs`dump_bytes_impl+0x49
      zfs`dump_bytes+0x49
      zfs`dump_record+0x190
      zfs`dump_data+0x26a
      zfs`backup_cb+0x4b5
      zfs`traverse_visitbp+0x3df
      zfs`traverse_visitbp+0x8e4
      zfs`traverse_visitbp+0x8e4
      zfs`traverse_dnode+0x1dc
      zfs`traverse_visitbp+0x6d2
      zfs`traverse_visitbp+0x8e4
     1183

The highest number of calls seem to be cpu idle calls...

          unix`mach_cpu_idle+0x17
          unix`cpu_idle+0x2b7
          unix`cpu_idle_adaptive+0x19
          unix`idle+0x11e
          unix`thread_start+0x8
      1147665

          unix`mach_cpu_idle+0x17
          unix`cpu_idle+0x2b7
          unix`cpu_idle_adaptive+0x19
          unix`idle+0x11e
          unix`thread_start+0x8
      2462890

During the zfs send, the drives are busy, but there aren't any waits and I don't think service times are all that bad...

                    extended device statistics
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 157.0    0.0    4.9    0.0  0.0  1.6    0.0   10.5   0  77 c0t5000C500A22D9330d0
 154.0    0.0    4.9    0.0  0.0  1.7    0.0   11.0   0  82 c0t5000C500A232AFA6d0
 186.0    0.0    6.4    0.0  0.0  2.4    0.0   12.7   0  93 c0t5000C500A24AD833d0
 185.0    0.0    6.3    0.0  0.0  1.8    0.0    9.9   0  79 c0t5000C500A243C8DEd0

During a tar, disk usage seems to be fairly similar... i.e. r/s, service times, %busy, etc, and yet the amount of data being read is vastly different:

                    extended device statistics
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 158.0    0.0   33.3    0.0  0.0  1.9    0.0   11.9   0  86 c0t5000C500A22D9330d0
 190.0    0.0   31.9    0.0  0.0  1.6    0.0    8.3   0  75 c0t5000C500A232AFA6d0
 170.0    0.0   37.1    0.0  0.0  1.7    0.0    9.7   0  80 c0t5000C500A24AD833d0
 168.0    0.0   38.4    0.0  0.0  1.7    0.0   10.1   0  80 c0t5000C500A243C8DEd0

What hardware? CPU(s)? RAM? Disks? Disk controller(s)? Is your pool or any datasets compresses or encrypted? Are you using dedup? — Andrew Henle
– Andrew Henle, Commented Aug 15, 2021 at 12:26
AMD Phenom(tm) II X6 1090T Processor, 32GB RAM, LSI SAS Controller. Drives are all Seagate ST6000VN0041 7200rpm 6TB drives -- 4 of them in a raidz1. No volumes are compressed, encrypted or deduped. iostat shows pretty even distribution of the load across the drives. As mentioned, overall performance of the array is excellent. It's just zfs send's that are giving me a fit. The zfs send issue affects all filesystems in the pool, and yet general disk access runs very smoothly. 24TB pool with 4.7TB free. — mikem
– mikem, Commented Aug 15, 2021 at 23:41
It doesn't look like you're CPU bound, at least not when you sampled the kernel stack traces. IME zfs send ... can have significant periods of time when the send "stalls" and not much data is moved. I seem to remember that the source filesystem's disks were busy being read when that happened. To figure out what's going on, you'll probably have to run the dTrace and iostat diagnostics while the "stall" is happening. Enabling SAR with a short collection interval (5-10 sec) might be necessary. — Andrew Henle
– Andrew Henle, Commented Aug 20, 2021 at 13:04
I think the stall must have been happening when I collected the dtrace and iostats added to the question above. In the 12 minutes the send was running, it only backed up 2.2GB. Is there a specific dtrace that would capture the stall? Or perhaps specific sar arguments that might provide some insight? "sar 1 600" barely changes from before the send command and when the send is kicked off. %sys and %idle barely fluctuate at all and %wio stays at 0. — mikem
– mikem, Commented Aug 20, 2021 at 14:44

Andrew Henle · Accepted Answer · 2021-08-17 15:27:08Z

While you're running your zfs send ... command, you can run this dTrace command to see where your kernel is spending its time:

dtrace -i 'profile:::profile-1001hz /arg0/ { @[ stack() ] = count(); }'

Start that command as root, let it run for a good while, hit CTRL-C to stop it, and it will emit all the kernel stack traces it sampled, in order of increasing numbers.

So the most common stack trace found will be last. That will be where your kernel is spending most of its time.

The information might or might not be helpful.

Or, you can save it in a file like this:

#!/usr/sbin/dtrace -s

profile:::profile-1001hz
/arg0/
{
    @[ stack() ] = count();
}

I've been using that little script since dTrace first appeared on Solaris 10. It's probably the most useful single dTrace script I've encountered because it tells you the answer to the question, "What's the system actually doing?"

Stack Exchange Network

zfs send performance

1 Answer 1

You must log in to answer this question.

Hot Network Questions

zfs send performance

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions