I'm having an odd performance issue when trying to backup my zfs filesystems.
I can tar the contents of a zfs filesystem at 100+ MB/s, but zfs send trickles the data at maybe 5 MB/s. The filesystem only has 5 or 6 snapshots. A tar takes about 1.5 hours. A zfs send takes more than 12 hours!
In both cases, the destination is a file on another pool. (i.e. zfs send tank/myfs > /backup/myfs.zfsbackup vs. tar -cf /backup/myfs.tar ./myfs)
My first thought was fragmentation, but if that was the case, wouldn't tar be just as slow?
I'm getting decent enough overall disk performance, but my backups are literally taking forever.
I'm running Solaris 11.4 on x64 hardware. Conceptually, the issue may be similar to zfs on Linux, but I'm not that familiar with the Linux variant.
I ran the dtrace script provided in the answer below for approx 12 minutes while a zfs send was running.
dtrace -i 'profile:::profile-1001hz /arg0/ { @[ stack() ] = count(); }'
I'm not sure how to interpret the results. There were two sections of summary that contained a good number of zfs calls:
zfs`zfs_fletcher_4_native+0x79
zfs`zfs_checksum_compute+0x181
zfs`zio_checksum_compute+0x1d6
zfs`zio_checksum_compute_dispatch+0x28
zfs`zio_checksum_generate+0x59
zfs`zio_execute+0xb4
genunix`taskq_thread+0x3d5
unix`thread_start+0x8
1041
unix`bcopy+0x55a
genunix`uiomove+0xb3
zfs`dmu_xuio_transform+0x83
zfs`zfs_write+0x78a
genunix`fop_write+0xf5
genunix`vn_rdwr_impl+0x1f3
genunix`vn_rdwr_uiov+0x63
zfs`dump_buffer_flush+0x8e
zfs`dump_buffer_append+0x85
zfs`dump_bytes_impl+0x49
zfs`dump_bytes+0x49
zfs`dump_record+0x190
zfs`dump_data+0x26a
zfs`backup_cb+0x4b5
zfs`traverse_visitbp+0x3df
zfs`traverse_visitbp+0x8e4
zfs`traverse_visitbp+0x8e4
zfs`traverse_dnode+0x1dc
zfs`traverse_visitbp+0x6d2
zfs`traverse_visitbp+0x8e4
1183
The highest number of calls seem to be cpu idle calls...
unix`mach_cpu_idle+0x17
unix`cpu_idle+0x2b7
unix`cpu_idle_adaptive+0x19
unix`idle+0x11e
unix`thread_start+0x8
1147665
unix`mach_cpu_idle+0x17
unix`cpu_idle+0x2b7
unix`cpu_idle_adaptive+0x19
unix`idle+0x11e
unix`thread_start+0x8
2462890
During the zfs send, the drives are busy, but there aren't any waits and I don't think service times are all that bad...
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
157.0 0.0 4.9 0.0 0.0 1.6 0.0 10.5 0 77 c0t5000C500A22D9330d0
154.0 0.0 4.9 0.0 0.0 1.7 0.0 11.0 0 82 c0t5000C500A232AFA6d0
186.0 0.0 6.4 0.0 0.0 2.4 0.0 12.7 0 93 c0t5000C500A24AD833d0
185.0 0.0 6.3 0.0 0.0 1.8 0.0 9.9 0 79 c0t5000C500A243C8DEd0
During a tar, disk usage seems to be fairly similar... i.e. r/s, service times, %busy, etc, and yet the amount of data being read is vastly different:
extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
158.0 0.0 33.3 0.0 0.0 1.9 0.0 11.9 0 86 c0t5000C500A22D9330d0
190.0 0.0 31.9 0.0 0.0 1.6 0.0 8.3 0 75 c0t5000C500A232AFA6d0
170.0 0.0 37.1 0.0 0.0 1.7 0.0 9.7 0 80 c0t5000C500A24AD833d0
168.0 0.0 38.4 0.0 0.0 1.7 0.0 10.1 0 80 c0t5000C500A243C8DEd0
zfs send ...can have significant periods of time when the send "stalls" and not much data is moved. I seem to remember that the source filesystem's disks were busy being read when that happened. To figure out what's going on, you'll probably have to run the dTrace andiostatdiagnostics while the "stall" is happening. Enabling SAR with a short collection interval (5-10 sec) might be necessary.