Skip to main content
5 of 19
added 394 characters in body
mikeserv
  • 59.4k
  • 10
  • 122
  • 242

Please try your yes thing again with an actual blocking factor. Like:

yes | dd ibs=k obs=kxk |     #input bs = 1k, ouput bs= 1k * 1k
dd bs=kxk count=10 of=out    #input and output bs = 1k * 1k

If you have a POSIX dd on site, your results will be:

10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.127955 s, 81.9 MB/s
$ ls -l out
-rw-r--r-- 1 mikeserv mikeserv 10485760 Mar 23 17:37 out

You have to synchronize i/o w/ dd to handle non-seekable inputs. In other words, make pipe-buffers explicit and they cease to be a problem. That's what dd is for. I suggest you read through the POSIX spec on dd. Particularly important is that you learn the different implications of any declarations for [ioc]?bs=.

In the above case the unknown quantity is the number of bytes yes write()s to the pipe each time. dd is spec'd for blocking out inputs, and so its [i]?bs= argument - when obs= or certain cbs= conversions are not specified - will result in input blocks being copied whole and singly from input to output. This means that count=10 results in only 10 read()s from the pipe and likewise 10 write()s to out.

However, when explicitly specified, dd will buffer input blocks to aggregate into a single obs= sized output block. And so you just need a second dd and a blocking factor. Tell the first dd to aggregate as many input blocks as are necessary to write your 1M block-size requirement, and from that point on both dds will synchronize their i/o - one writing 1M blocks to the pipe, the second reading them in the same size and counting up to 10 1M blocks written into out.

Of course, if you knew certainly the size of the writes yes was doing you wouldn't even need two dds because you could just synchronize dd w/ yes instead. But when you can't know, dd is there. That's why we have it. Look:

yes | dd ibs=3333 obs=3333x444 | 
dd bs=3333x444 count=10 of=out;\
ls -l out

10+0 records in
10+0 records out
14798520 bytes (15 MB) copied, 0.186561 s, 79.3 MB/s
-rw-r--r-- 1 mikeserv mikeserv 14798520 Mar 23 18:30 out

See? It works with i/o blocks of any shape or size.

Here's a snippet from the POSIX spec:

  • ibs=expr
    • Specify the input block size, in bytes, by expr (default is 512).
  • obs=expr
    • Specify the output block size, in bytes, by expr (default is 512).
  • bs=expr
    • Set both input and output block sizes to expr bytes, superseding ibs= and obs=. If no conversion other than sync, noerror, and notrunc is specified, each input block shall be copied to the output as a single block without aggregating short blocks.

You'll also find some of this explained better here.

mikeserv
  • 59.4k
  • 10
  • 122
  • 242