2

I have a shell script that I am trying to pass a date argument to ARGV[1] but the script is giving a blank output

Here is the command:

#!/bin/bash
dt=$(date -d "yesterday" '+%m%d%Y')
cat /tmp/log.$AUTOSERVE.$dt \
  | perl -ne '/STATUS:\s+(\w+).+MACHINE:\s+(\w+.\w+.\w+)$/ && print join( "\t", $1, $2 ). "\n"' \
  | grep -E '(SUCCESS|FAILURE|TERMINATED)' \
  | cut -f2 \
  | sort \
  | uniq -c \
  | perl -ne '/^\s+(\d+)\s+(.*)$/ && print join("\t", '$ARGV[1]', $ENV{AUTOSERV}, $2, $1) . "\n"' $date_YYYYMMDD \
  > /tmp/output.txt

What am I doing wrong?

Let me explain what I am trying to do here:

We have log files that get generated everyday with the name like

log.$AUTOSERVE.mmddyyyy

the log file contains the data like below:

Made changes to the input date to get better understanding:

Time            Message           
____________________________________________

[11/16/2023 07:13:45]    CAUAJM_I_12345 The application has rollover

[11/16/2023 07:13:45]     CAUAJM_I_11111  The machine 111.test.com has lost connection
[11/16/2023 07:13:45]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: ABC MACHINE: 111.test.com EXITCODE: 1

[11/16/2023 07:13:45]      CAUAJM_I_40245 [222.test.com connected to ABC]

[11/16/2023 07:13:45] CAUAJM_I_40245 EVENT: CHANGE_STATUS    ALARM: JOBFAILURE         JOB: ABC EXITCODE: 1
[11/16/2023 07:13:45]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: TERMINATED      JOB: XYZ MACHINE: 222.test.com
[11/16/2023 07:13:46]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: 123 MACHINE: 333.test.com
[11/16/2023 07:13:46]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: SUCCESS         JOB: 456 MACHINE: 444.test.com EXITCODE: 0
[11/16/2023 07:13:46]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: SUCCESS         JOB: ABC123 MACHINE: 555.test.com
[11/16/2023 07:13:45]      CAUAJM_I_40245 [222.test.com connected to ABC]
[11/16/2023 07:13:45]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: FAILURE         JOB: ABC MACHINE: 111.test.com EXITCODE: 1
[11/16/2023 07:13:45]      CAUAJM_I_40245 [333.test.com connected to 123]
[11/16/2023 07:13:45]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: TERMINATED      JOB: XYZ MACHINE: 222.test.com
[11/16/2023 07:13:46]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: STARTING        JOB: 123 MACHINE: 333.test.com
[11/16/2023 07:13:46]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: SUCCESS         JOB: 456 MACHINE: 444.test.com 
[11/16/2023 07:13:46]      CAUAJM_I_40245 EVENT: CHANGE_STATUS    STATUS: SUCCESS         JOB: ABC123 MACHINE: 555.test.com EXITCODE: 0

This shell script that filters this log file for MACHINE and STATUS search string and counts how many jobs have run on each machine

the output i am getting is :

            NP2     111.test.com      2
            NP2     222.test.com      2
            NP2     444.test.com      2
            NP2     555.test.com      2

i tried changing the $date_YYYYMMDD to $dt:

cat /tmp/log.$AUTOSERVE.dt \
  | perl -ne '/STATUS:\s+(\w+).+MACHINE:\s+(\w+.\w+.\w+)$/ && print join( "\t", $1, $2 ). "\n"' \
  | grep -E '(SUCCESS|FAILURE|TERMINATED)' \
  | cut -f2 \
  | sort \
  | uniq -c \
  | perl -ne '/^\s+(\d+)\s+(.*)$/ && print join("\t", $ARGV[1], $ENV{AUTOSERV}, $2, $1) . "\n"' $dt \
  > /tmp/output.txt

But i am getting the below error:

Can't open 11152023: No such file or directory.

Given I have an environment variable $AUTOSERVE that provides the NP2 value in this output, what I am expecting is:

11152023   NP2     111.test.com      2
11152023   NP2     222.test.com      2
11152023   NP2     444.test.com      2
11152023   NP2     555.test.com     2

4
  • 2
    I am not familiar with perl, but assuming in terms of perl it is fine to use "$ARGV[1]" instead of '$ARGV[1]' for you goal, you should use the former as the latter is conflicting with your current quoting in bash. Commented Nov 17, 2023 at 7:21
  • @TomYan I have tried that but it’s still not accepting the date argument.. Commented Nov 17, 2023 at 7:22
  • 3
    Your are trying to use the shell variable date_YYYYMMDD as argument to the perl code. But this variable is nowhere defined in your shell script. Instead you have only defined dt. Apart from that your code use unescaped single quotes inside single quotes, i.e. perl -ne ' ... '$ARGV[1]' ... '. No idea what you are trying to achieve here, but this is basically trying to interpret $ARGV[1] from the shell and do string concatenation with the rest. Just skip these internal single quotes Commented Nov 17, 2023 at 7:36
  • Also, I'm not so sure you can use ARGV in the way you expect anyway when you are using -n at the same time. (And at least in the case that you don't use -n, $date_YYYYMMDD appears to be really, ARGV[0], because foo in -e foo is not considered an arg in that context, apparently). See if this helps. Commented Nov 17, 2023 at 7:53

3 Answers 3

6

Sounds like you want something like:

#! /bin/sh -
DT=$(date -d yesterday +%m%d%Y) || exit
export DT
exec perl -lne '
  if (
    ($status, $machine) = /STATUS:\s+(\w+).+MACHINE:\s+(\w+\.\w+\.\w+)$/ and
    $status =~ /^(SUCCESS|FAILURE|TERMINATED)\z/
  ) {$count{$machine}++}
  END {
    for (keys %count) {
      print join "\t", $ENV{DT}, $ENV{AUTOSERVE}, $_, $count{$_};
    }
  }' < ~/tmp/log."$AUTOSERVE.$dt" > ~/tmp/output.txt

Note:

  • one MUST NOT use files with fixed names in world-writable directories such as /tmp (hence the switch to ~/tmp here, or use a dedicated area in /var or ~/var/~/.local/$XDG_RUNTIME_DIR...).

  • there's nothing bash-specific in that code, so no need to add that bash dependency.

  • with -n, extra arguments to perl are the input to the script

  • as Chris already said, you had issues with your quoting.

  • you have a AUTOSERV / AUTOSERVE discrepancy.

  • . is a regex operator that matches any single character. Use \. or [.] to match a literal dot.

  • beware that usage of date is GNU-specific. Not all date implementations support a -d option, and among those that do, it can be for something totally unrelated like on BSDs or they don't recognise yesterday (like with the date of busybox or toybox). perl can also do date manipulation¹ if you need that script to be portable to non-GNU systems.

  • You could easily change that to use a single regexp such as:

    /STATUS:\s+(?:SUCCESS|FAILURE|TERMINATED)\b.+MACHINE:\s+(\w+.\w+.\w+)$/
    
  • replace keys %count with sort cmp, keys %count if you want the list of machines to be sorted lexically.

  • exec, common in wrapper scripts like this one is just to save a process. It tells the shell to run perl in the same process rather than in a child process and wait for it. cmd does fork()+exec(cmd)&wait(child), while exec cmd (which maybe should have been called nofork cmd) is just exec(cmd) so even though it's longer to type, it's simpler/shorter to run for the system and uses less resource.

  • %m%d%Y is not a very good choice of timestamp format. It's ambiguous and its lexical order (like in the output of ls) does not match the chronological order. %Y-%m-%d or %F for short is much better as it's universally recognised, and sorts lexically chronologically (at least for years 0001 to 9999).

  • cat is the command to concatenate files, it makes little sense to use it for one file. Using cmd < input > output (or <input cmd >output, but not cmd > output < input) also has the benefit that if input can't be opened for reading, then cmd won't be run and output won't be clobbered.


¹ For instance here by adding -MPOSIX and a BEGIN{@t = localtime; $t[3]--; $dt = strftime "%m%d%Y", @t} or even as a hack just -M'POSIX;@t = localtime; $t[3]--; $dt = strftime "%m%d%Y", @t'.

0
5

One issue that jumps out at me is in the second perl line:

perl -ne '/^\s+(\d+)\s+(.*)$/ && print join("\t", '$ARGV[1]', $ENV{AUTOSERV}, $2, $1) . "\n"' $date_YYYYMMDD

You come out of single quotes just in time to use $ARGV[1], so the shell has a stab at parsing it. Typically the shell variable $ARGV won't be set, so the resulting line passed to perl is this:

perl -ne '/^\s+(\d+)\s+(.*)$/ && print join("\t", [1], $ENV{AUTOSERV}, $2, $1) . "\n"' $date_YYYYMMDD

and that's syntactically valid (but unlikely to be useful) so you won't get any errors.

If you remove the two single quotes in the middle of the line you'll almost get something that looks like it might be what you want. You'll need to swap out -n for an explicit loop to read from stdin so that your @ARGV value can be captured. Lists, including @ARGV, start from zero so I've changed that too.

perl -e 'while (<STDIN>) { chomp; /^\s+(\d+)\s+(.*)$/ && print join("\t", $ARGV[0], $ENV{AUTOSERV}, $2, $1) . "\n" }' "$date_YYYYMMDD"

Here is an alternative pipeline for you that will take your source file and produce the stated output:

awk -v date="$(date --date 'yesterday' +'%m%d%Y')" '

    # Count instances of IP address for finished jobs
    /SUCCESS|FAILURE|TERMINATED/ {
        if (m = index($0, "MACHINE:")) {
            # address is after "machine"
            ip = substr($0, m+9, length($0))

            if (s = index(ip, " ")) {
                # discard trailing text too
                ip = substr(ip, 1, s-1)
            }

            # capture address
            seen[ip]++
        }
    }

    # Output list of addresses and counts
    END {
        OFS="\t"
        for (ip in seen) {
            print date, ENVIRON["AUTOSERVE"], seen[ip], ip
        }
    }
' "/tmp/log.$AUTOSERVE.dt"

With AUTOSERVE=NP2 and a suitable date match, I get this result from your sample data file

11162023        NP2     2       222.test.com
11162023        NP2     2       111.test.com
11162023        NP2     2       555.test.com
11162023        NP2     2       444.test.com

It's probably worth noting that the construct if (m = index($0, "MACHINE:")) is an assignment with a subsequent test for non-zero. If I had wanted a comparison I should have had to use == instead of =. It could equivalently have been written like this

m = index($0, "MACHINE:")
if (m<>0)
3
  • Hi @chris.. this awk script doesn’t give any output… do we not have to close the IF condition.. can you test this again if you are getting an output with the input I have given.. Commented Nov 20, 2023 at 8:26
  • the output you are getting is different than what I am expecting.. Commented Nov 20, 2023 at 9:54
  • No excuses. Just a typo! Really fixed this time, @NecroCoder Commented Nov 20, 2023 at 10:53
4

The reason this is failing is first because you are exiting the single quotes:

perl -ne '[...] '$ARGV[1]', [...]'

So your $ARGV[1] is being seen by the shell and not perl. Next, you don't actually have an ARGV array here because you told perl to read from stdin since you're using -n:

$ perl -le 'print "$ARGV[0]"' foo 
foo
$ perl -nle 'print "$ARGV[0]"' foo
$ 

You can either use -n which means you pipe data or ask perl to autoload and iterate over a file, or you can pass arguments, but not both.

So, what you really wanted to do was:

export dt=$(date -d "yesterday" '+%m%d%Y')  
perl -ne '/STATUS:\s+(\w+).+MACHINE:\s+(\w+.\w+.\w+)$/ && print join( "\t", $1, $2 ). "\n"' /tmp/log.$AUTOSERVE.dt |
  grep -E '(SUCCESS|FAILURE|TERMINATED)' |
  cut -f2 |
  sort |
  uniq -c |
  perl -ne '/^\s+(\d+)\s+(.*)$/ && print join("\t", $ENV{dt}, $ENV{AUTOSERV}, $2, $1) . "\n"'  > /tmp/output.txt

Or, since you're already using perl, and assuming I'm guessing your data correctly:

export dt=$(date -d "yesterday" '+%m%d%Y')  
perl -lne '/STATUS:\s+(SUCCESS|FAILURE|TERMINATED).+MACHINE:\s+(\w+.\w+.\w+)$/ && 
    print "$2"' /tmp/log.$AUTOSERVE.dt |
  sort |
  uniq -c |
  perl -ne '/^\s+(\d+)\s+(.*)$/ && 
   print join("\t", $ENV{dt}, $ENV{AUTOSERV}, $2, $1) . "\n"'   > /tmp/output.txt

Or even:

export dt=$(date -d "yesterday" '+%m%d%Y')  
perl -lne '
if(/STATUS:\s+(SUCCESS|FAILURE|TERMINATED).+MACHINE:\s+(\w+.\w+.\w+)$/){
   $k{$2}++;
}
END{
  foreach $key (keys(%k)){
    print "$ENV{dt}\t$ENV{AUTOSERV}\t$key\t$k{$key}"
  }
}' > /tmp/output.txt

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.