I have the following problem: I want to extract text that is inside brackets from a string (with or without the brackets). My string looks like this:
STR="[1] [2][345] [678 9] foo bar"
I initially wanted to use bash regex and BASH_REMATCH. I ended up using the following code:
regex='\[([^\]]*)\](.*)'
MATCHES=()
STR="[1] [2][345] [678 9] foo bar"
while [[ -n $STR && $STR =~ $regex ]];
do
MATCHES+=("${BASH_REMATCH[1]}")
STR=${BASH_REMATCH[2]}
echo -e "matches: ${BASH_REMATCH[1]} -> ${BASH_REMATCH[2]}"
done
This kind of worked but my issue was that it would only capture one character inside the brackets, hence [345] would result in 3.
I could not figure out why that was happening so I ended up using grep and PCRE after all. My current solution is
regex="\[[^\]]*?\]"
if [[ $(grep -o '\[.*\]' <<< $STR) ]];
then
MATCHES=$(grep -oP "$regex" <<< $STR)
else
echo "No special flags provided."
exit 0
fi
I then proceed to a for loop:
for arg in $MATCHES;
do
echo $arg
done
The problem is that it does not separate the fields as I would want them to. I used hexdump in order to find out the proper delimiter:
hexdump -C <<< $MATCHES
which, to my surprise, showed that the delimiter is in hex 0a, the LF. That was not an issue as I know that for loop uses IFS for splitting. I then set IFS to LF by using IFS=$'\n'. To my (once again) surprise, that set the value of IFS to 0a0a, according to hexdump again. So that did not work. I then set the value of IFS to IFS='' and that (for my third surprise) set the value to 0a. But that did not work either, the for loop did not change behavior. Perhaps the scope of IFS was not set correctly by my script?
My questions are the following:
1) Why did the original bash only regex approach did not work? Why was it only capturing a single character? regex101 dot com showed the expected behavior, but then again, it does not provide a bash regex mode.
2) Why does the IFS set not work as I would have expected? It adds an "extra" LF, even when I set it to empty.
3) Why does IFS not seem to affect the for loop?
4) Is there a simpler way for me to tackle the original problem (extracting [foo] [bar] [foo bar] from strings like [foo] [bar] 1 asdf[foo bar], in a way that I can loop for each bracket pair).
Bonus question!
B) I am confused as to when I should enclose a variable or expression in quotes or double-quotes. I have read a bit about globbing and parameter expansion and I am now looking for something more in-depth. Any recommendations?