If you do need a shell loop to process that, you could use read
's IFS
-spliting to split on |
s instead of using awk
:
#! /bin/bash -
shopt -s extglob # for +(...) ksh-style glob operator
trim() {
typeset -n _var
for _var do
_var=${_var##+([[:space:]])}
_var=${_var%%+([[:space:]])}
done
}
{
IFS= read -ru3 header_discarded
while IFS='|' read -ru3 rev svn_path file download_options rest_if_any_discarded; do
trim rev svn_path file download_options
# do what you need with those variables
typeset -p rev svn_path file download_options
done
} 3< input_file.txt
Here, on your sample, that gives:
declare -- rev="1336"
declare -- svn_path="svn/Repo/PROD"
declare -- file="test2.txt"
declare -- download_options="PROGRAM APPLICATION_SHORT_NAME=\"SQLGL\""
declare -- rev="1334"
declare -- svn_path="svn/Repo/PROD"
declare -- file="test.txt"
declare -- download_options="REQUEST_GROUP REQUEST_GROUP_NAME=\"Program Request Group\" APPLICATION_SHORT_NAME=\"SQLGL\""
That input could also be seen as simple CSV with |
as field separator, so you could preprocess it with something like:
mlr --csvlite --fs '|' --ho --ragged clean-whitespace then \
cut -of 'REV NUM,FILE NAME,DOWNLOAD OPTIONS'
Which would take care of extracting the fields you want, however they're positioned in the input and do the whitespace trimming (and then pipe to IFS='|' read -r rev file download_options
)
As to why you're only getting the first word of each column, in:
REV_NUM=($(awk -F "|" 'NR>1 {print $1}' input_file.txt))
That unquoted $(...)
used in list context (here the assignment to an array variable) is invoking the split+glob operator. The glob part you don't want so should be disabled (with set -o noglob
), and the splitting part is done based on the list of characters in the $IFS
variable.
By default, that contains space, tab and newline, but here you want to split on newline only.
While you could do IFS=$'\n'
, that would still not work if there were empty lines in the awk
output as those would be discarded.
To store all the lines in an array, you'd use bash's readarray
builtin:
readarray -s1 rev < <(
awk -F'[[:space:]]*[|][[:space:]]*' '{print $1}'
)
Or:
shopt -s lastpipe
awk -F'[[:space:]]*[|][[:space:]]*' '{print $1}' |
readarray -s1 rev
(-s1
skips the first line (same as using NR>1
in awk
); we include whitespace¹ around the |
in the field separator, though that would not still trim the leading ones of the first field or the trailing ones of the last field if any).
¹ beware mawk
doesn't support POSIX character classes, so on systems that still use that awk
implementation, you'd need to replace [[:space:]]
with an explicit list of whitespace characters to trim such as [ \t\r\v\f]
; mawk
also doesn't support multibyte characters, so you can't include non-ASCII whitespace characters in locales using UTF-8 either.
REV_NUM
but it has not been defined.DL_OPS
is used as an array, but it’s not an array.|
, e.g. could you have an input line like1336 |svn/Repo/PROD | test2.txt |whatever="foo|bar" |
? File names can contain|
(and newlines!) so could you have a file name likethis|that.txt
so you get an input line like1336 |svn/Repo/PROD | this|that.txt |whatever="foo|bar" |
?