Consider a file with key=value pairs, and each key is optionally a concatenation of multiple keys. In other words, many keys can map to one value. The reason behind this is that each key is a relatively short word compared to the length of the value, hence the data is being 'compressed' into lesser lines.
Illustration (i.e. not the real values):
$ cat testfile
AA,BB,CC=a-lengthy-value
A,B,C=a-very-long-value
D,E,F=another-very-long-value
K1,K2,K3=many-many-more
Z=more-long-value
It is valid to assume that all keys are unique, and will not contain the following characters:
keydelimiter:,- key-value delimiter:
= - whitespace character:
keys may come in any form in the future (with the above constraints), they currently adhere to the following regex coincidentally: [[:upper:]]{2}[[:upper:]0-9]. Likewise, values will not contain =, so = can be safely used to split each line. There are no multi-line keys or values, so it is also safe to process line-by-line.
In order to facilitate data extraction from this file, a function getval() is defined as such:
getval() {
sed -n "/^\([^,]*,\)*$1\(,[^=]*\)*=\(.*\)$/{s//\3/p;q}" testfile
}
As such, calling getval A will return the value a-very-long-value, not a-lengthy-value. It should also return nothing for a non-existent key.
Questions:
- Is the current definition of
getval()robust enough? - Are there alternative ways of performing the data extraction that are possibly shorter/more expressive/more restrictive?
For what it's worth, this script will run with cygwin's bash and coreutils that comes with it. Portability is not required here as a result (i.e. only brownie points will be given). Thanks!
edit:
Corrected function, added clarification about the keys.
edit 2:
Added clarification about the format (no multi-lines) and portability (not a requirement).
$1should be the key to search for, e.g.AAwill only map toa-very-long-value. There can be lines likeAA,BB,CC=a-lengthy-valuebut that should not be a match, because the key to search for isAand notAA.