2

I was writing a shell script to generate a string in a particular format so that it can be used as a input in one of the XML I am working with.

Given a input file in the format <attribute field>,<data_type>,<size>

instanceid,varchar,256
sysdate,date
status,number
notes,varchar,4000
created_on,date

I want to store in a variable the "check sum" like md5( INSTANCEID || STATUS || NOTES). That is I want all the attribute fields except the field having date as it's type Or'd.

The script I had written is this

IFS=$'\n'
file=$(cat source.txt)
line_number=$(cat source.txt | wc -l)
checksum="md5( "
for line in $file
do
let line_number=line_number-1
data_field=$(echo $line | cut -f1 -d','| tr "a-z" "A-Z")
data_type=$(echo $line | cut -f2 -d',' | tr "a-z" "A-Z")
if [ $data_type != "DATE" ]  && [ $line_number -gt 0 ]
  then checksum+="$data_field || "
elif [ $data_type != "DATE" ] && [ $line_number -eq 0 ]
  then checksum+=" $data_field "
fi
done
checksum+=")"
echo $checksum

This script works fine with all the input scenarios except when the last line has a attribute with date as it's type.

In which case the variable has a value likemd5( INSTANCEID || STATUS || NOTES || )

I tried to check if the last line was a date using tail command, but this again would fail if the last few lines had it's type as date.

How can I do away with the || which appears in the end?

0

3 Answers 3

3

The quick answer is checksum="${checksum% || })" instead of checksum+=")". Just unconditionally add the || string in each step and then strip off the last unnecessary one at the very end (so the line_number computation is no longer needed).

A better way to do this is

awk -F, 'BEGIN { printf "md5( " } 
         toupper($2) != "DATE" { printf "%s%s", sep, toupper($1); sep = " || " }
         END { print ")" }' source.txt
1
  • It’s remarkable how rarely cat is useful in a shell script.  $(cat source.txt | wc -l) is a classic useless use of cat; if you needed to count the lines in a file, $(wc -l < source.txt) is a much cleaner ways of doing it.
  • But you don’t need to count the lines in source.txt.
  • file=$(cat source.txt) is an ugly way to read a file;
    while read …
    do
        ︙
    done < filename
    is better.  read has the benefit that it can split lines into fields for you.
  • It’s silly to run tr twice for each line of the file when you need only to run it once on the entire file.  In some situations,
    tr … < filename | while read …
    do
        ︙
    done
    works nicely.  But there’s a problem with this: the while loop runs in a subshell, so changes that you make to shell variables (e.g., checksum) won’t be visible after the loop ends.  Terdon shows one way to work around that problem; here’s another one:
    tr … < filename | { while read …
    do
            ︙
        commands that potentially change checksum.
            ︙
    done
        ︙
    commands that use $checksum.
        ︙
    }
  • As you’ve discovered, identifying the last occurrence of something can be difficult.  It’s often easier to identify the first:

    checksum="md5("
    first=1
    tr "a-z" "A-Z" < source.txt | { while IFS=, read data_field data_type size
    do
        if [ "$data_type" != "DATE" ]
        then
            if [ "$first" ]
            then
                first=
            else
                checksum+=" || "
            fi
            checksum+="$data_field"
        fi
    done
    checksum+=")"
    echo "$checksum"
    }
    

    Note that you really don’t need to test if [ "$data_type" != "DATE" ] twice.
    Also note that you should always quote references to shell variables (e.g., "$data_type") unless you have a good reason not to and you’re sure you know what you’re doing.

  • As a further optimization, you can eliminate the first variable and simply use checksum itself to identify your first iteration through the loop:

    checksum=
    tr "a-z" "A-Z" < source.txt | { while IFS=, read data_field data_type size
    do
        if [ "$data_type" != "DATE" ]
        then
            if [ "$checksum" != "" ]
            then
                checksum+=" || "
            fi
            checksum+="$data_field"
        fi
    done
    checksum="md5($checksum)"
    echo "$checksum"
    }
    
0

You don't need anything half as complicated as what you've written. You could just do:

#!/usr/bin/env bash

checksum="md5("
## Read each line into the fields array (read -a fields), with fields
## separated by commas (IFS=,)
while IFS=, read -a fields
do
    ## If the 2nd element of the array is not "DATE"
    if [ ${fields[1]} != "DATE" ]
    then
        ## Add this to $checksum
        checksum+="${fields[0]} || "
    fi
## The tr is making everything upper case and then feeds
## directly into the while loop.
done < <(tr "a-z" "A-Z" < "$1")
## Get rid of the last || and add the closing ")"
checksum="${checksum% || })"
printf "OUT is: %s\n" "$checksum"

You then run the script with your file as input:

$ foo.sh file
OUT is: md5(INSTANCEID || STATUS || NOTES)
2
  • (1) “The script you posted only stores the details of the first line of input.”  Huh?  file=$(cat source.txt) reads the entire file, and for line in $file loops through the lines.  (2) I’m shocked that < "$@" isn’t a syntax error.  If there are two or more parameters, it is equivalent to < "$*". Commented May 20, 2015 at 4:37
  • @G-Man 1) yes, I had originally misunderstood the requirements. That sentence was left over from my first draft of the answer, thanks for pointing it out. 2) You're absolutely right, I don't know what I was thinking. Fixed now. Commented May 20, 2015 at 11:20

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.