Extracting values from a file keyed by multiple keys

Question

Consider a file with key=value pairs, and each key is optionally a concatenation of multiple keys. In other words, many keys can map to one value. The reason behind this is that each key is a relatively short word compared to the length of the value, hence the data is being 'compressed' into lesser lines.

Illustration (i.e. not the real values):

$ cat testfile
AA,BB,CC=a-lengthy-value
A,B,C=a-very-long-value
D,E,F=another-very-long-value
K1,K2,K3=many-many-more
Z=more-long-value

It is valid to assume that all keys are unique, and will not contain the following characters:

key delimiter: ,
key-value delimiter: =
whitespace character:

keys may come in any form in the future (with the above constraints), they currently adhere to the following regex coincidentally: [[:upper:]]{2}[[:upper:]0-9]. Likewise, values will not contain =, so = can be safely used to split each line. There are no multi-line keys or values, so it is also safe to process line-by-line.

In order to facilitate data extraction from this file, a function getval() is defined as such:

getval() {
    sed -n "/^\([^,]*,\)*$1\(,[^=]*\)*=\(.*\)$/{s//\3/p;q}" testfile
}

As such, calling getval A will return the value a-very-long-value, not a-lengthy-value. It should also return nothing for a non-existent key.

Questions:

Is the current definition of getval() robust enough?
Are there alternative ways of performing the data extraction that are possibly shorter/more expressive/more restrictive?

For what it's worth, this script will run with cygwin's bash and coreutils that comes with it. Portability is not required here as a result (i.e. only brownie points will be given). Thanks!

edit:

Corrected function, added clarification about the keys.

edit 2:

Added clarification about the format (no multi-lines) and portability (not a requirement).

Well, in the example, A will only map to a-very-long-value. There can be lines like AA,BB,CC=a-lengthy-value but that should not be a match, because the key to search for is A and not AA. — h.j.k.
– h.j.k., Commented Jan 12, 2015 at 11:22
@mikeserv yups. A comma will always be followed by another key. :) — h.j.k.
– h.j.k., Commented Jan 12, 2015 at 11:32

jimmij · Accepted Answer · 2015-01-12 12:48:04Z

2

You can write it in much more readable form using awk:

getval() {
    awk -F'=' '$1~/\<'"$1"'\>/{print $2}' testfile
}

edited Jan 12, 2015 at 12:48

answered Jan 12, 2015 at 10:53

jimmij

48.7k20 gold badges136 silver badges141 bronze badges

I made a significant change to getval() after your posting to fix a bug, as such I'm afraid this is not what I'm looking for. Could you update your answer if possible? Thanks! :) getval() looks right and I don't think I'll update it any further.

h.j.k.
– h.j.k.

2015-01-12 11:31:07 +00:00
Commented Jan 12, 2015 at 11:31
That looks okay to me, but you could mark the start of the match using (^|,), to avoid partial matches.

muru
– muru

2015-01-12 11:57:48 +00:00
Commented Jan 12, 2015 at 11:57
@muru Good point! I only took care of substring at the end of the key with [,=], forgot to do the same at the beginning. Thanks.

jimmij
– jimmij

2015-01-12 12:06:39 +00:00
Commented Jan 12, 2015 at 12:06
This is quite close, but fails for K3 in the example. :(

h.j.k.
– h.j.k.

2015-01-12 12:28:11 +00:00
Commented Jan 12, 2015 at 12:28
@h.j.k. OK see the update, hopefully the last one. I changed pattern match beginning \< and end of the word \>.

jimmij
– jimmij

2015-01-12 12:48:30 +00:00
Commented Jan 12, 2015 at 12:48

| Show 1 more comment

mikeserv · Accepted Answer · 2015-01-12 18:49:19Z

1

With sed...

getval() { sed "/^\([^=]*,\)*$1[,=]/!d;s/.*=//;q"; } <infile

You might want to work on validating $1 as input though.

Or with GNU grep and cut:

getval() { grep -Em1 "^([^=]*,)*$1[,=]" | cut -d= -f2-; } <infile

edited Jan 12, 2015 at 18:49

answered Jan 12, 2015 at 11:18

mikeserv

59.4k10 gold badges122 silver badges242 bronze badges

yeah. Definitely brownie points over the gawk-specific solution provided by @jimmij.

h.j.k.
– h.j.k.

2015-01-12 15:59:22 +00:00
Commented Jan 12, 2015 at 15:59
1

@h.j.k. - well, in fairness, most of my first ideas sucked. But then I took your $subexpression$* and jimmij's [,=] alternation and put them together ok.

mikeserv
– mikeserv

2015-01-12 16:04:16 +00:00
Commented Jan 12, 2015 at 16:04

Add a comment |

Stack Exchange Network

Extracting values from a file keyed by multiple keys

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Extracting values from a file keyed by multiple keys

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions