Return to Revisions

8 of 19

added 564 characters in body

edited Mar 24, 2015 at 14:58

59.4k
10
122
242

From the spec:

If the bs=expr operand is specified and no conversions other than sync, noerror, or notrunc are requested, the data returned from each input block shall be written as a separate output block; if the read() returns less than a full block and the sync conversion is not specified, the resulting output block shall be the same size as the input block.

So this is probably what causes your confusion. Yes, because dd is designed for blocking, by default partial read()s will be mapped 1:1 to partial write()s, or else syncd out on tail padding to bs= size with tail-padding of NUL or space chars when conv=sync is specified.

However, this is only true when bs= is specified or obs= is not specified, as the very next sentence in the spec states:

If the bs=expr operand is not specified, or a conversion other than sync, noerror, or notrunc is requested, the input shall be processed and collected into full-sized output blocks until the end of the input is reached.

Without ibs= and/or obs= arguments this can't matter - because ibs and obs are both the same size by default. However, you can get explicit about input buffering by specifying different sizes for either and not specifying bs= (because it takes precedence). For example, if you do:

IN| dd ibs=1| OUT

dd will write() in chunks of 512 bytes by collecting every singly read() byte into a single output block. Otherwise, if you do...

IN| dd obs=kxk| OUT

dd will read() at maximum 512 bytes at a time, but write() every megabyte-sized output block (excepting possibly the last - because that's EOF) in full by collecting input into full-sized output blocks.

Also from the spec, though:

count=n
- Copy only n input blocks.

count= maps to [i]?bs= blocks, so let's change it up a little bit and try your yes thing again with an actual blocking factor. Like:

yes | dd ibs=k obs=kxk |     #input bs = 1k, ouput bs= 1k * 1k
dd bs=kxk count=10 of=out    #input and output bs = 1k * 1k

If you have a POSIX dd on site, your results will be:

10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.127955 s, 81.9 MB/s
$ ls -l out
-rw-r--r-- 1 mikeserv mikeserv 10485760 Mar 23 17:37 out

You have to synchronize i/o w/ dd to handle non-seekable inputs. In other words, make pipe-buffers explicit and they cease to be a problem. That's what dd is for. I suggest you read through the POSIX spec on dd. Particularly important is that you learn the different implications of any declarations for [ioc]?bs=.

In the above case the unknown quantity is the number of bytes yes write()s to the pipe each time. dd is spec'd for blocking out inputs, and so its [i]?bs= argument - when obs= or certain cbs= conversions are not specified - will result in input blocks being copied whole and singly from input to output. This means that count=10 results in only 10 read()s from the pipe and likewise 10 write()s to out.

However, when explicitly specified, dd will buffer input blocks to aggregate into a single obs= sized output block. And so you just need a second dd and a blocking factor. Tell the first dd to aggregate as many input blocks as are necessary to write your 1M block-size requirement, and from that point on both dds will synchronize their i/o - one writing 1M blocks to the pipe, the second reading them in the same size and counting up to 10 1M blocks written into out.

Of course, if you knew certainly the size of the writes yes was doing you wouldn't even need two dds because you could just synchronize dd w/ yes instead. But when you can't know, dd is there. That's why we have it. Look:

yes | dd ibs=333333 obs=333333x44 | 
dd bs=333333x44 count=10 of=/dev/null

10+0 records in
10+0 records out
146666520 bytes (147 MB) copied, 1.69232 s, 86.7 MB/s

See? It works with i/o blocks of any shape or size.

Here's a snippet from the POSIX spec:

ibs=expr
- Specify the input block size, in bytes, by expr (default is 512).
obs=expr
- Specify the output block size, in bytes, by expr (default is 512).
bs=expr
- Set both input and output block sizes to expr bytes, superseding ibs= and obs=. If no conversion other than sync, noerror, and notrunc is specified, each input block shall be copied to the output as a single block without aggregating short blocks.

You'll also find some of this explained better here.

answered Mar 24, 2015 at 1:47

mikeserv

59.4k
10
122
242