Skip to main content
Became Hot Network Question
edited tags
Link
Kusalananda
  • 355.8k
  • 42
  • 735
  • 1.1k
Source Link
user648855
  • 223
  • 1
  • 5

How can I extract quoted strings within a variable?

I acknowledge there are superficially similar questions asked here before, but all of those I've seen are simpler than what I'm trying to achieve. Bash-only solutions are preferred.

I have a variable containing a string that looks like a comparison of some kind, and I'd like to split it into an array. The following are some examples, including how I'd like them to be split:

var='name="value"'                # arr=([0]=name [1]='=' [2]=value)
var="name != '!value='"           # arr=([0]=name [1]='!=' [2]='!value=')
var='"na=me" = value'             # arr=([0]=na=me [1]='=' [2]=value)
var='name >= value'               # arr=([0]=name [1]='>=' [2]=value)
var='name'                        # arr=([0]=name)
var='name = "escaped \"quotes\""' # arr=([0]=name [1]='=' [2]=escaped\ \"quotes\")
var="name = \"nested 'quotes'\""  # arr=([0]=name [1]='=' [2]=nested\ \'quotes\')
var="name = 'nested \"quotes\"'"  # arr=([0]=name [1]='=' [2]=nested\ \"quotes\")

You get the picture. Either side (or neither) may be quoted, with either single or double-quotes. There might be escaped or otherwise nested quotes. The operator between them can be any of a predefined set, but they may also be included within the quoted strings. There may or may not be spaces. There may be no operator at all.

I have to parse a lot of lines, and therefore I'd prefer not to fork a new process each time, which is why Bash-only solutions are preferred. This is an addition to an existing Bash script that does not need to be portable to other shells, and it's running on Bash 5.2, so I do have access to modern Bash features that may be helpful.

IFS=\" read -a arr <<<"$var" is nice in that it understands how to handle escaped quotes, and if I only had to deal with either single or double quotes and not both, I could make this work. As it stands, I'm just hoping I don't have to write a whole tokenizer algorithm in shell script, and that there's some combination of features I haven't considered which can parse this reliably.