4

I am running a utility, which emits the following:

  • Progress to its Standard Error
  • Data/Yield/Output to its Standard Output

I did not build the utility, nor can I easily modify it.

I wish to do the following:

  • Send its Standard Error directly to Standard Output
  • Buffer its Output, and flush it to Standard Output once the command exits
    (There is less than 10KiB of data liable to be here, so RAM is no issue.)

Can this be done in POSIX sh (and calling in only utilities common to both Linux and OpenBSD), without the indeterminism / potential race conditions / etc. arising from a named pipe or temporary file?

2 Answers 2

5

You should be able to do something like:

{
  cmd 2>&3 3>&- |
    awk '    {saved = saved $0 ORS}
         END {printf "%s", saved}' 3>&-
} 3>&1

Here using awk to hold all cmd's output (after cmd has written its stderr output to the script's stdout).

awk will read until the writing end of the pipe is closed. Normally, that only happens when cmd (and all the processes it forks and still hold a fd to the pipe) finishes. If for some reason, cmd decides to explicitly close its stdout and later write some more progress on stderr, that extra progress could end up after the normal output. You could work around that by replacing cmd with (cmd; exit), where awk would then also wait for that subshell (which also has its stdout open to the pipe) to finish and that subshell happens to wait for cmd to finish (and report its exit status with exit).

But that should not be necessary with a well-behaved cmd. That would also not address the case where cmd forks (and doesn't wait for) a child process with its stdout redirected, which could write to its stderr long after awk or even that script finishes (probably a more likely scenario than a command that explicitly closes its stdout).

If cmd's output is not text, note that not all awk implementations can deal with byte 0 or extra-long lines, and a newline character will be added at the end if it was not already in the input.

The POSIX toolchest doesn't have any command that can store arbitrary amounts of binary data in memory and display it later.

If perl is available, you can replace the awk command with just perl -0777 -pe ''.

Here, instead of memory, you could store the output in a temporary file, which would address the binary output issue and would likely scale better to larger outputs.

Unfortunately, the only POSIX way to create a temporary file reliably is using the m4 utility, but that utility (even though one mandated by POSIX) is not always found on production systems these days. You're probably more likely to find perl than m4.

In any case, that could be:

die() {
  [ "$#" -eq 0 ] || printf >&2 '%s\n' "$@"
  exit 1
}

tmpdir=${TMPDIR:-/tmp}
tmpfile=$(
  echo 'mkstemp(TEMPLATE)' |
    m4 -D "TEMPLATE=${tmpdir%/}/XXXXXXX"
) && [ -n "$tmpfile" ] || die 'Cannot get a temp file'

{
  rm -f -- "$tmpfile" || die "Cannot remove $tmpfile"
  cmd 2>&1 >&3 3>&- 4<&-
  cat <&4
} 3> "$tmpfile" 4< "$tmpfile"

Here unlinking the temporary file after it's been opened but before running cmd as a neat way to handle clean-up.

If you're targeting only GNU (remember "Linux" is not an OS, just a kernel found on a great variety of OSes, some of which don't even have a sh) and OpenBSD systems, then you should be able to use mktemp instead of m4 to create the temporary file.

4
  • This is really, really close to being a perfect/correct solution. Does sh not contain any builtins, or POSIX prescribe any similar functions, to mimic awk's functionality here, but without interfering in the text? Commented May 4, 2020 at 19:05
  • @KamilMaciorowski, you're making very good points, and I've incorporated some in my answer. A note however: though some awk implementations will choke on NUL and extra-long lines, they are required do deal with non-delimited lines. Commented May 5, 2020 at 5:58
  • 1
    Now I see a broader picture. After reading the definition of "shall" I understand "input files shall be text files" does not imply "non-text files may be unsupported because they don't meet the requirement". It means "the utility must support text files and may still be obliged to support (some) other files". So "input files shall" is not about the files, it's about the utility. In particular for awk "input files shall be text files", still RS may be something else than a newline, so the input may totally not comply with the formal definitions of "text file" and "line". Lesson learned. Commented May 5, 2020 at 6:58
  • It appears that { ($cmd 2>&3 3>&-) | perl -0777 -pe '' 3>&- ; } 3>&1 works perfectly. Commented May 6, 2020 at 18:35
3
#!/bin/bash

mycmd() {
  echo progress >&2
  echo out
  sleep 1
  echo progress >&2
  echo out
  sleep 1
  echo progress >&2
  echo out
  sleep 1
}

# Make sure the tempfile is made in a secure way that avoids indeterminism / potential race conditions / etc.
tmpfile="$(tempfile)"
# Write output to the file
mycmd 2>&1 >"$tmpfile"; cat "$tmpfile"; rm "$tmpfile"

# or if output is small: Write it to an environment variable.
# These are below the limit for "small" for different shells
# (Determined by making "mycmd" output that amount of data)
# ash  30 GB
# dash 30 GB
# bash 3 GB
# zsh  3 GB
# ksh  1 GB
out=`mycmd` 2>&1
echo "$out"

Both solutions will be visible by root (on the file system and in /proc/*/environ) and no other users on the system.

If your system has sponge and the shell supports /dev/stdout:

{
  cmd 2>&3 3>&- |
    sponge /dev/stdout 3>&-;
} 3>&1

(Ksh version:

$ ksh --version
  version         sh (AT&T Research) 93u+ 2012-08-01

)

5
  • Your first solution uses a named pipe (/tmp/buffer), which I specifically requested against. The second solution looks like it might be viable, however. 🤔 I just tested it, and it looks like even dash 0.5.10.2 supports newlines-in-variables! Commented May 4, 2020 at 15:14
  • In the process of trying to figure out why I didn't think of this already, I remembered why I had decided against this: does this not expose out to other processes on the system? (In this case, the data is somewhat private.) Commented May 4, 2020 at 15:17
  • 1
    /tmp/buffer is not a named pipe. It is a file. Commented May 4, 2020 at 15:45
  • 1
    How did you determine those 30 GB / 3 GB ...? What do you mean by ksh? ksh88, ksh93, ksh2020, pdksh, mksh or other pdksh derivative like the sh of openbsd? Commented May 4, 2020 at 17:46
  • @OleTange ha, well played... I have edited my question to clarify. Commented May 4, 2020 at 19:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.