bash scripting: Saving and then processing file contents as a bash variable

Question

I have a super complicated bash script that extracts results from a large output file (produced on a LINUX machine, just in case this is relevant). As part of this process, I use combinations of grep, head, tail, etc that extract sub-sections from this larger file; this sub-section of text is then saved to a temporary file which is then further processed. I have produced a simpler example here so I can frame my question, which is:

How can I avoid the need to save to this temporary file?

What I would like to do is, rather than save this sub-section of text to a temporary file, I would like to save the sub-section of data (including carriage returns) to a bash variable which can then then be processed further.

The problem is the bash scripts I am writing do not 'see' the carriage returns. In my example below, I have a file 'exampledata.data' containing the following text:

START_BLOCK #1
line a b c
line b
END_BLOCK #1

START_BLOCK #2
Line 1 2
Line 2 7 
 Line 3
Line 4
END_BLOCK #2

START_BLOCK #3
Line x s d e f 
END_BLOCK #3

My original script (which saves to a temporary file) works as expected, with the awk command correctly displaying the 2nd token for all lines within each 'block':

#!/bin/bash
file="examplefile.data"                         # File to process
totblock=`grep "START_BLOCK" $file | wc -l`     # Determine number of blocks of data in file

# Current implementation - which works
for ((l=1; $l<=${totblock}; l++)); do           # Loop through each block of data
  echo "BLOCK "$l

# display file contents -> extract subsection of data for current block -> Remove top and bottom -> Save to temporary file
  cat $file |                                           \
  sed -n '/START_BLOCK #'${l}'/,/END_BLOCK #'${l}'/p' | \
  grep -Ev "START|END"                                  > TEMPFILE

# Perform some rudimentary processing on this temporary file to check the overall process is working
  awk '{print $2}' TEMPFILE
done
rm TEMPFILE

If I then attempt to save what would have been saved to TEMPFILE to a bash variable (bashvar), all carriage returns are lost resulting in one long line. As a consequence, the awk command essentially only shows the 2nd token of the first line, which is not what I want:

#!/bin/bash
file="examplefile.data"                         # File to process
totblock=`grep "START_BLOCK" $file | wc -l`     # Determine number of blocks of data in file

# New implementation with the aim to avoid the need to write to a temporary file (TEMPFILE)
for ((l=1; $l<=${totblock}; l++)); do
  echo "BLOCK "$l

# As above but rather than piping the output to a file, save it to a bash-variable
  bashvar=`cat $file | \
  sed -n '/START_BLOCK #'${l}'/,/END_BLOCK #'${l}'/p' | \
  grep -Ev "START|END"`

# Perform the same rudimentary test to confirm the overall process is working
  echo $bashvar | awk '{print $2}'
done

just add " " around $bashvar ? ie : echo "${bashvar}" | awk '{print $2} . And I would also change the bashvar=.... with bashvar="$( cat $file ....... | grep -Ev "START|END" )" , the extra surrounding "" will help keep beginning/ending spaces — Olivier Dulac
– Olivier Dulac, Commented Sep 18, 2024 at 9:55
Always double quote your variables when you use them. (Insead of echo $bashvar use echo "$bashvar", etc. — Chris Davies
– Chris Davies, Commented Sep 18, 2024 at 9:58
another approach could be: awk '/START_BLOCK/ { numblock=$2; sub("^#", "", numblock); print "BLOCK " numblock ; p=1; record=""; next}; (p==1) { record=record ( numblock>1 ? RS : "" ) $0 } ; /END_BLOCK/ { process_the_record() ; p=0; next } ' exemplefile.data, process_the_record() being a function defined in the beginning of the awk script: function process_the_record() { ... } — Olivier Dulac
– Olivier Dulac, Commented Sep 18, 2024 at 10:01

terdon · Accepted Answer · 2024-09-18 13:09:02Z

First of all, you really don't want to do things like this in bash or any other shell. Use a real programming language instead. It will be easier, faster and more efficient.

That said, the reason this doesn't work for you is because you are not quoting the variable, so the shell applies split + glob. So simply changing your final echo command to this would work:

  echo "$bashvar" | awk '{print $2}'

However, there are various other issues and improvements you can make here. The grep command can count for you, no need for wc. You should avoid var=`command` and use var=$(command) instead. You should quote all your variables. Use mktemp to create temporary files (irrelevant if you want to avoid files, but bear it mind for next time). Avoid hardcoding file names, use arguments instead. Use grep -w to ensure complete matches (so that NOT_A_START_BLOCK is not considered a match for START_BLOCK). You don't need cat "$file" | sed, you can do sed "$file" directly. No need for \ after a |, you can break the line on the |. Here's a version of your script taking all this into account:

#!/bin/bash
file="$1"                         # File to process
totblock=$(grep -wc "START_BLOCK" "$file")     # Determine number of blocks of data in file


for ((l=1; $l<=${totblock}; l++)); do           # Loop through each block of data
  echo "BLOCK $l"
  data=$(sed -n '/START_BLOCK #'"$l"'/,/END_BLOCK #'"$l"'/p' "$file" | 
         grep -Ev "START|END") 
  awk '{print $2}' <<<"$data"
done

Do note that this would really be better in pretty much any other language. Since you're using awk, why not do the whole thing in awk?

$ awk '{ if(/START_BLOCK/){ printf "BLOCK "} else if(/END_BLOCK/ || /^$/){next;}{ print $2}}' file 
BLOCK #1
a
b
BLOCK #2
1
2
3
4
BLOCK #3
x

I suggest you post a new question, describe the actual processing you are doing and ask for a solution to do that. Looping over sections of a text file in bash really isn't a very good idea.

great answer, covering many of the shortcomings in the original attempt — Olivier Dulac
– Olivier Dulac, Commented Sep 18, 2024 at 15:06
Thanks for the complete answer, and for the extra suggestions for improvement especially on the use of grep. Regarding the use of bash, I completely agree something like python would be better but for the time being, it is all about modifying an existing script rather than inventing and testing something completely new. — r.g.
– r.g., Commented Sep 19, 2024 at 12:24
"No need for ` after a |, you can break the line on the |`." Huh... never knew that. Not sure If I'll remove my backslashes, though, since backslash proclaims continuation. — RonJohn
– RonJohn, Commented Sep 27, 2024 at 0:11

Stack Exchange Network

bash scripting: Saving and then processing file contents as a bash variable

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

bash scripting: Saving and then processing file contents as a bash variable

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions