I have a super complicated bash script that extracts results from a large output file (produced on a LINUX machine, just in case this is relevant). As part of this process, I use combinations of grep, head, tail, etc that extract sub-sections from this larger file; this sub-section of text is then saved to a temporary file which is then further processed. I have produced a simpler example here so I can frame my question, which is:
How can I avoid the need to save to this temporary file?
What I would like to do is, rather than save this sub-section of text to a temporary file, I would like to save the sub-section of data (including carriage returns) to a bash variable which can then then be processed further.
The problem is the bash scripts I am writing do not 'see' the carriage returns. In my example below, I have a file 'exampledata.data' containing the following text:
START_BLOCK #1
line a b c
line b
END_BLOCK #1
START_BLOCK #2
Line 1 2
Line 2 7
Line 3
Line 4
END_BLOCK #2
START_BLOCK #3
Line x s d e f
END_BLOCK #3
My original script (which saves to a temporary file) works as expected, with the awk command correctly displaying the 2nd token for all lines within each 'block':
#!/bin/bash
file="examplefile.data" # File to process
totblock=`grep "START_BLOCK" $file | wc -l` # Determine number of blocks of data in file
# Current implementation - which works
for ((l=1; $l<=${totblock}; l++)); do # Loop through each block of data
echo "BLOCK "$l
# display file contents -> extract subsection of data for current block -> Remove top and bottom -> Save to temporary file
cat $file | \
sed -n '/START_BLOCK #'${l}'/,/END_BLOCK #'${l}'/p' | \
grep -Ev "START|END" > TEMPFILE
# Perform some rudimentary processing on this temporary file to check the overall process is working
awk '{print $2}' TEMPFILE
done
rm TEMPFILE
If I then attempt to save what would have been saved to TEMPFILE to a bash variable (bashvar), all carriage returns are lost resulting in one long line. As a consequence, the awk command essentially only shows the 2nd token of the first line, which is not what I want:
#!/bin/bash
file="examplefile.data" # File to process
totblock=`grep "START_BLOCK" $file | wc -l` # Determine number of blocks of data in file
# New implementation with the aim to avoid the need to write to a temporary file (TEMPFILE)
for ((l=1; $l<=${totblock}; l++)); do
echo "BLOCK "$l
# As above but rather than piping the output to a file, save it to a bash-variable
bashvar=`cat $file | \
sed -n '/START_BLOCK #'${l}'/,/END_BLOCK #'${l}'/p' | \
grep -Ev "START|END"`
# Perform the same rudimentary test to confirm the overall process is working
echo $bashvar | awk '{print $2}'
done
$bashvar? ie :echo "${bashvar}" | awk '{print $2}. And I would also change the bashvar=.... withbashvar="$( cat $file ....... | grep -Ev "START|END" )", the extra surrounding "" will help keep beginning/ending spacesecho $bashvaruseecho "$bashvar", etc.awk '/START_BLOCK/ { numblock=$2; sub("^#", "", numblock); print "BLOCK " numblock ; p=1; record=""; next}; (p==1) { record=record ( numblock>1 ? RS : "" ) $0 } ; /END_BLOCK/ { process_the_record() ; p=0; next } ' exemplefile.data, process_the_record() being a function defined in the beginning of the awk script:function process_the_record() { ... }