How to call bash function from within awk?

Question

This is a bit tricky; I'm trying to work out the best approach to this problem. I have a couple of approaches, but they seem really hacky and I'd like something a little more elegant.

I want to parse a whitespace delimited file, ignoring #comment lines and complaining of any non-empty lines that don't have exactly 4 fields. This is easy enough in awk:

awk '/^#/ {next}; NF == 0 {next}; NF != 4 {exit 1}; (dostuff)'

The trick is what I want to do with the data, is actually set it as variables in bash and then run a bash function, unless $2 contains a specific value.

Here is some pseudocode (mostly real but mixed languages) to explain what I mean:

# awk
/^#/ {next}
NF == 0 {next}
NF != 4 {exit 1}
$2 == "manual" {next}
# bash
NAME=$1
METHOD=$2
URL=$3
TAG=$4
complicated_bash_function_that_calls_lots_of_external_commands
# then magically parse the next line with awk.

I don't know how to do this without some ugly workarounds, such as calling awk or sed separately for each line of the file. (Originally I put the question as "How to call bash function from within awk or each output line of awk from within bash?")

Possibly it would work to modify the bash function into its own script, and make it accept arguments 1, 2, 3, 4 as above. I'm not sure how to call that from within awk, though; hence my question title.

What I would actually prefer to do, is have the whole thing in one file and make it a bash script - calling awk from within bash rather than bash from within awk. But I will still need to call the bash function from within awk--once for each non-comment line of the input file.

How can I do this?

can you pipe from awk into a while IFS= read -r loop or similar? — cas
– cas, Commented Nov 13, 2015 at 2:10
Just to mention for future readers another possibility I considered, was the system() command in awk. — Wildcard
– Wildcard, Commented Nov 13, 2015 at 3:58
another method is to have your awk/perl/whatever script generate shell commands and pipe them into /bin/sh or /bin/bash. pipe into /bin/cat for testing, then pipe into /bin/sh to run them. another non-UUOC from the makers of "substitute head -n 10000 for cat when testing scripts with huge input files". — cas
– cas, Commented Nov 13, 2015 at 4:07
No, the system() command just executes the command in a subshell and returns to awk, the output of that command may even be mangled with the output of your awk script in some funny cases unless you use fflush("/dev/stdout"). If you need to parse the output of the command you need to use the | getline syntax. — asoundmove
– asoundmove, Commented Nov 17, 2015 at 1:31

cas · Accepted Answer · 2021-08-18 02:16:01Z

You may be able to do what you want by piping awk's output into a while read loop. For example:

awk '/^#/ {next}; NF == 0 {next}; NF != 4 {exit 1} ; {print}' | 
    while read -r NAME METHOD URL TAG ; do
        :  # do stuff with $NAME, $METHOD, $URL, $TAG
        echo "$NAME:$METHOD:$URL:$TAG"
    done

if [ "$PIPESTATUS" -eq 1 ] ; then
    : # do something to handle awk's exit code
fi

Tested with:

$ cat input.txt 
# comment
NAME METHOD URL TAG
a b c d
1 2 3 4
x y z
a b c d

$ ./testawk.sh input.txt 
NAME:METHOD:URL:TAG
a:b:c:d
1:2:3:4

Note that it correctly exits on the fifth x y z input line.

It's worth pointing out that because the while loop is the target of a pipe, it executes in a sub-shell and is therefore unable to alter the environment (including environment variables) of its parent script.

If that is required, then don't use a pipe, use redirection and process substitution instead:

while read -r NAME METHOD URL TAG ; do
  :  # do stuff with $NAME, $METHOD, $URL, $TAG
  echo "$NAME:$METHOD:$URL:$TAG"
done < <(awk '(/^#/ || NF == 0) {next};
              NF != 4 {
                printf "%s:%s:Wrong number of fields\n", FILENAME, NR > "/dev/stderr";
                exit 1
               };
              {print}' input.txt)

# getting the exit code from the <(...) requires bash 4.4 or newer:
wait $!

if [ "$?" -ne 0 ] ; then
 : # something went wrong in the process substitution, deal with it
fi

Alternatively, you can use the coproc built-in to run the awk script in the background as a co-process:

# By default, array var $COPROC holds the co-process' stdout and
# stdin file descriptors.   See `help coproc`.
coproc {
  awk '(/^#/ || NF == 0) {next};
       NF != 4 {
         printf "%s:%s:Wrong number of fields\n", FILENAME, NR > "/dev/stderr";
         exit 1
       };
       {print}' input.txt
}
awkpid="$!"
#declare -p COPROC # uncomment to see the FDs

while read -r NAME METHOD URL TAG ; do
  echo "$NAME:$METHOD:$URL:$TAG"
done <&"${COPROC[0]}"

wait "$awkpid"
echo "$?"

Wow! This is great! (I didn't realize a newline after a pipe doesn't have to be escaped...that's just an extra bonus for me.) ;) — Wildcard
– Wildcard, Commented Nov 13, 2015 at 2:34
ps: if there were too many other complications (e.g. white-space in the fields) i'd probably just rewrite the whole thing in perl - which is the ideal language when you need to combine the features of awk (and sed and tr etc) and shell. — cas
– cas, Commented Nov 13, 2015 at 2:34
Yes, I definitely need to learn perl. Thank you! Regarding traps—can you point me to where I can learn about them with a view to implementing one for this command? (I'm sure that's another thing that perl would handle more easily for this case....) — Wildcard
– Wildcard, Commented Nov 13, 2015 at 2:37
btw, the while loop will finish when the awk script stops piping data into it - i.e. when it exits. — cas
– cas, Commented Nov 13, 2015 at 3:11
anonymous downvoter: if you're going to downvote, at least have the courtesy to explain why. — cas
– cas, Commented Nov 17, 2015 at 2:15

asoundmove · Accepted Answer · 2015-11-17 01:28:44Z

4

cas's answer is good, but if you actually need to parse the output in awk again and want to do this from within the first awk command you have a fantastic pipe command syntax in awk:

awk '
{
  cmd = "echo name:tag:url:method" # (very simple example)
  while (cmd | getline)
  {
    #process output ($0)
    print
  }
  close(cmd)
}
'

answered Nov 17, 2015 at 1:28

asoundmove

2,5154 gold badges25 silver badges27 bronze badges

Interesting...could you provide a link to the documentation for the awk pipe command syntax? Also is this POSIX compatible? And, what is that getline in there?

Wildcard
– Wildcard

2015-11-17 01:44:26 +00:00
Commented Nov 17, 2015 at 1:44
Sorry just closed my laptop, so not sure if it is posix. Look up getline in the info/manpage it is the internal awk function to read a line. You can either read from the awk input, from a different file or from a command.

asoundmove
– asoundmove

2015-11-17 01:51:45 +00:00
Commented Nov 17, 2015 at 1:51
I use gawk (GNU awk) and it seems to be POSIX compliant, see awk(1) - Linux man page, look for command |

asoundmove
– asoundmove

2015-11-18 00:49:28 +00:00
Commented Nov 18, 2015 at 0:49

Add a comment |

Stack Exchange Network

How to call bash function from within awk?

2 Answers 2

You must log in to answer this question.

Hot Network Questions

How to call bash function from within awk?

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions