Revisions to How to run a pipe safely and sequentially?

added 6 characters in body

Source Link

edited Aug 26, 2017 at 14:59

584.6k
96
1.1k
1.7k

Then, you don't have to worry about clean-up as the file is deleted straight awayfrom the start. No need for extra processes to get the data in and out of buckets, cmd1 and cmd2 do it by themselves.

If you wanted to store the output ofin memory, using a shell for that would not be a good idea. First shells other than zsh can't store arbitrary data in their variables. You'd need to use some form of encoding. And then, to parsepass that data around, you'd end up duplicating it in memory several times, if not writing it to disk when using a here-doc or here string-string.

 perl -MPOSIX -e '
   sub status() {return WIFEXITED($?) ? WEXITSTATUS($?) : WTERMSIG($?) | 128}
   $/ = undef;
   open A, "-|", "cmd1" or die "open A: $!\n";
   $out = <A>;
   close A;
   $status = status;
   exit $status if $status != 0;

   open B, "|-", "cmd2" or die "open B: $!\n";
   print B $out;
   close B;
   exit status'

Then, you don't have to worry about clean-up as the file is deleted straight away. No need for extra processes to get the data in and out of buckets, cmd1 and cmd2 do it by themselves.

If you wanted to store the output of memory, using a shell for that would not be a good idea. First shells other than zsh can't store arbitrary data in their variables. You'd need to use some form of encoding. And then, to parse that data around, you'd end up duplicating it in memory several times, if not writing it to disk when using a here-doc or here string.

 perl -MPOSIX -e '
   sub status() {return WIFEXITED($?) ? WEXITSTATUS($?) : WTERMSIG($?)}
   $/ = undef;
   open A, "-|", "cmd1" or die "open A: $!\n";
   $out = <A>;
   close A;
   $status = status;
   exit $status if $status != 0;

   open B, "|-", "cmd2" or die "open B: $!\n";
   print B $out;
   close B;
   exit status'

Then, you don't have to worry about clean-up as the file is deleted from the start. No need for extra processes to get the data in and out of buckets, cmd1 and cmd2 do it by themselves.

If you wanted to store the output in memory, using a shell for that would not be a good idea. First shells other than zsh can't store arbitrary data in their variables. You'd need to use some form of encoding. And then, to pass that data around, you'd end up duplicating it in memory several times, if not writing it to disk when using a here-doc or here-string.

 perl -MPOSIX -e '
   sub status() {return WIFEXITED($?) ? WEXITSTATUS($?) : WTERMSIG($?) | 128}
   $/ = undef;
   open A, "-|", "cmd1" or die "open A: $!\n";
   $out = <A>;
   close A;
   $status = status;
   exit $status if $status != 0;

   open B, "|-", "cmd2" or die "open B: $!\n";
   print B $out;
   close B;
   exit status'

added 803 characters in body

Source Link

edited Aug 26, 2017 at 14:33

Stéphane Chazelas

584.6k
96
1.1k
1.7k

The whole point of piping commands is to run them concurrently with one reading the output of the other. If you want to run them sequentially, and if we keep the plumbing metaphor, you'll need to pipe the output of the first command to a bucket (store it) and then empty the bucket into the other command.

But doing it with pipes means having two processes for the first command (the command and another process reading its output from the other end of the pipe to store in the bucket), and two for the second one (one emptying the bucket into one end of the pipe for the command to read it from the other end).

For the bucket, you'll need either memory or the file system. Memory doesn't scale well and you need the pipes. The filesystem makes much more sense. That's what /tmp is for. Note that the disks are likely not to ever see the data anyway as the data may not be flushed there until much later (after you remove the temp file), and even if it is, it will likely still remain in memory (cached). And when it's not, that's when the data would have been too big to fit in memory in the first place.

Note that temporary files are used all the time in shells. In most shells, here documents and here strings are implemented with temp files.

In:

cat << EOF
foo
EOF

Most shells create a tempfile, open it for writing and for reading, delete it, fill it up with foo, and then run cat with its stdin duplicated from the fd open for reading. The file is deleted even before it filled up (that gives the system a clue that it whatever is written there doesn't need to survive a power loss).

You could do the same here with:

tmp=$(mktemp) && {
  rm -f -- "$tmp" &&
    cmd1 >&3 3>&- 4<&- &&
    cmd2 <&4 4<&- 3>&-
} 3> "$tmp" 4< "$tmp"

Then, you don't have to worry about clean-up as the file is deleted straight away. No need for extra processes to get the data in and out of buckets, cmd1 and cmd2 do it by themselves.

If you wanted to store the output of memory, using a shell for that would not be a good idea. First shells other than zsh can't store arbitrary data in their variables. You'd need to use some form of encoding. And then, to parse that data around, you'd end up duplicating it in memory several times, if not writing it to disk when using a here-doc or here string.

You could use perl instead for instance:

 perl -MPOSIX -e '
   sub status() {return WIFEXITED($?) ? WEXITSTATUS($?) : WTERMSIG($?)}
   $/ = undef;
   open A, "-|", "cmd1" or die "open A: $!\n";
   $out = <A>;
   close A;
   $status = status;
   exit $status if $status != 0;

   open B, "|-", "cmd2" or die "open B: $!\n";
   print B $out;
   close B;
   exit status'

The whole point of piping commands is to run them concurrently with one reading the output of the other. If you want to run them sequentially, and if we keep the plumbing metaphor, you'll need to pipe the output of the first command to a bucket (store it) and then empty the bucket into the other command.

But doing it with pipes means having two processes for the first command (the command and another process reading its output from the other end of the pipe to store in the bucket), and two for the second one (one emptying the bucket into one end of the pipe for the command to read it from the other end).

For the bucket, you'll need either memory or the file system. Memory doesn't scale well and you need the pipes. The filesystem makes much more sense. That's what /tmp is for. Note that the disks are likely not to ever see the data anyway as the data may not be flushed there until much later (after you remove the temp file), and even if it is, it will likely still remain in memory (cached). And when it's not, that's when the data would have been too big to fit in memory in the first place.

Note that temporary files are used all the time in shells. In most shells, here documents and here strings are implemented with temp files.

In:

cat << EOF
foo
EOF

Most shells create a tempfile, open it for writing and for reading, delete it, fill it up with foo, and then run cat with its stdin duplicated from the fd open for reading. The file is deleted even before it filled up (that gives the system a clue that it whatever is written there doesn't need to survive a power loss).

You could do the same here with:

tmp=$(mktemp) && {
  rm -f -- "$tmp" &&
    cmd1 >&3 3>&- 4<&- &&
    cmd2 <&4 4<&- 3>&-
} 3> "$tmp" 4< "$tmp"

Then, you don't have to worry about clean-up as the file is deleted straight away. No need for extra processes to get the data in and out of buckets, cmd1 and cmd2 do it by themselves.

The whole point of piping commands is to run them concurrently with one reading the output of the other. If you want to run them sequentially, and if we keep the plumbing metaphor, you'll need to pipe the output of the first command to a bucket (store it) and then empty the bucket into the other command.

But doing it with pipes means having two processes for the first command (the command and another process reading its output from the other end of the pipe to store in the bucket), and two for the second one (one emptying the bucket into one end of the pipe for the command to read it from the other end).

For the bucket, you'll need either memory or the file system. Memory doesn't scale well and you need the pipes. The filesystem makes much more sense. That's what /tmp is for. Note that the disks are likely not to ever see the data anyway as the data may not be flushed there until much later (after you remove the temp file), and even if it is, it will likely still remain in memory (cached). And when it's not, that's when the data would have been too big to fit in memory in the first place.

Note that temporary files are used all the time in shells. In most shells, here documents and here strings are implemented with temp files.

In:

cat << EOF
foo
EOF

Most shells create a tempfile, open it for writing and for reading, delete it, fill it up with foo, and then run cat with its stdin duplicated from the fd open for reading. The file is deleted even before it filled up (that gives the system a clue that it whatever is written there doesn't need to survive a power loss).

You could do the same here with:

tmp=$(mktemp) && {
  rm -f -- "$tmp" &&
    cmd1 >&3 3>&- 4<&- &&
    cmd2 <&4 4<&- 3>&-
} 3> "$tmp" 4< "$tmp"

Then, you don't have to worry about clean-up as the file is deleted straight away. No need for extra processes to get the data in and out of buckets, cmd1 and cmd2 do it by themselves.

If you wanted to store the output of memory, using a shell for that would not be a good idea. First shells other than zsh can't store arbitrary data in their variables. You'd need to use some form of encoding. And then, to parse that data around, you'd end up duplicating it in memory several times, if not writing it to disk when using a here-doc or here string.

You could use perl instead for instance:

 perl -MPOSIX -e '
   sub status() {return WIFEXITED($?) ? WEXITSTATUS($?) : WTERMSIG($?)}
   $/ = undef;
   open A, "-|", "cmd1" or die "open A: $!\n";
   $out = <A>;
   close A;
   $status = status;
   exit $status if $status != 0;

   open B, "|-", "cmd2" or die "open B: $!\n";
   print B $out;
   close B;
   exit status'

added 89 characters in body

Source Link

edited Aug 26, 2017 at 14:11

Stéphane Chazelas

584.6k
96
1.1k
1.7k

The whole point of piping commands is to run them concurrently with one reading the output of the other. If you want to run them concurrentlysequentially, and if we keep the plumbing metaphor, you'll need to pipe the output of the first command to a bucket (store it) and then empty the bucket into the other command.

But doing it with pipes means having two processes for the first command (the command and another process reading its output from the other end of the pipe to store in the bucket), and two for the second one (one emptying the bucket into one end of the pipe for the command to read it from the other end).

For the bucket, you'll need either memory or the file system. Memory doesn't scale well and you need the pipes. The filesystem makes much more sense. That's what /tmp is for. Note that the disks are likely not to ever see the data anyway as the data may not be flushed there until much later (after you remove the temp file), and even if it is, it will likely still remain in memory (cached). And when it's not, that's when the data would have been too big to fit in memory in the first place.

Note that temporary files are used all the time in shells. In most shells, here documents and here strings are implemented with temp files.

In:

cat << EOF
foo
EOF

Most shells create a tempfile, open it for writing and for reading, delete it, fill it up with foo, and then run cat with its stdin duplicated from the fd open for reading. The file is deleted even before it filled up (that gives the system a clue that it whatever is written there doesn't need to survive a power loss).

You could do the same here with:

tmp=$(mktemp) && {
  rm -f -- "$tmp" &&
    cmd1 >&3 3>&- 4<&- &&
    cmd2 <&4 4<&- 3>&-
} 3> "$tmp" 4< "$tmp"

Then, you don't have to worry about clean-up as the file is deleted straight away. No need for extra processes to get the data in and out of buckets, cmd1 and cmd2 do it by themselves.

The whole point of piping commands is to run them concurrently with one reading the output of the other. If you want to run them concurrently, and if we keep the plumbing metaphor, you'll need to pipe the output of the first command to a bucket (store it) and then empty the bucket into the other command.

But doing it with pipes means having two processes for the first command (the command and another process reading its output from the other end of the pipe to store in the bucket), and two for the second one (one emptying the bucket into one end of the pipe for the command to read it from the other end).

For the bucket, you'll need either memory or the file system. Memory doesn't scale well and you need the pipes. The filesystem makes much more sense. That's what /tmp is for. Note that the disks are likely not to ever see the data anyway as the data may not be flushed there until much later (after you remove the temp file), and even if it is, it will likely still remain in memory (cached). And when it's not, that's when the data would have been too big to fit in memory in the first place.

Note that temporary files are used all the time in shells. In most shells, here documents and here strings are implemented with temp files.

In:

cat << EOF
foo
EOF

Most shells create a tempfile, open it for writing and for reading, delete it, fill it up with foo, and then run cat with its stdin duplicated from the fd open for reading. The file is deleted even before it filled up (that gives the system a clue that it whatever is written there doesn't need to survive a power loss).

You could do the same here with:

tmp=$(mktemp) && {
  rm -f -- "$tmp" &&
    cmd1 >&3 3>&- 4<&- &&
    cmd2 <&4 4<&- 3>&-
} 3> "$tmp" 4< "$tmp"

Then, you don't have to worry about clean-up as the file is deleted straight away. No need for extra processes to get the data in and out of buckets, cmd1 and cmd2 do it by themselves.

The whole point of piping commands is to run them concurrently with one reading the output of the other. If you want to run them sequentially, and if we keep the plumbing metaphor, you'll need to pipe the output of the first command to a bucket (store it) and then empty the bucket into the other command.

But doing it with pipes means having two processes for the first command (the command and another process reading its output from the other end of the pipe to store in the bucket), and two for the second one (one emptying the bucket into one end of the pipe for the command to read it from the other end).

For the bucket, you'll need either memory or the file system. Memory doesn't scale well and you need the pipes. The filesystem makes much more sense. That's what /tmp is for. Note that the disks are likely not to ever see the data anyway as the data may not be flushed there until much later (after you remove the temp file), and even if it is, it will likely still remain in memory (cached). And when it's not, that's when the data would have been too big to fit in memory in the first place.

Note that temporary files are used all the time in shells. In most shells, here documents and here strings are implemented with temp files.

In:

cat << EOF
foo
EOF

Most shells create a tempfile, open it for writing and for reading, delete it, fill it up with foo, and then run cat with its stdin duplicated from the fd open for reading. The file is deleted even before it filled up (that gives the system a clue that it whatever is written there doesn't need to survive a power loss).

You could do the same here with:

tmp=$(mktemp) && {
  rm -f -- "$tmp" &&
    cmd1 >&3 3>&- 4<&- &&
    cmd2 <&4 4<&- 3>&-
} 3> "$tmp" 4< "$tmp"

Then, you don't have to worry about clean-up as the file is deleted straight away. No need for extra processes to get the data in and out of buckets, cmd1 and cmd2 do it by themselves.

added 89 characters in body

Source Link

edited Aug 26, 2017 at 14:05

Stéphane Chazelas

584.6k
96
1.1k
1.7k

Loading

Source Link

answered Aug 26, 2017 at 14:00

Stéphane Chazelas

584.6k
96
1.1k
1.7k

Loading

Stack Exchange Network

Return to Answer