4

I have below string, e.g.

2017-01-19:31:51 [ABCD:] 37723 - MATCH: 10 [text]

I want to find MATCH and print its value, 10, using awk. I am able to do this using traditional grep and cut but want to find way using sed or awk.

MATCH can be at any position on the line.

2 Answers 2

9
sed -n 's/.* MATCH: \([^ ]*\).*/\1/p'

Would print the sequence of non-space characters that follow the right-most occurrence of " MATCH: " on every matching line.

-n tells sed not to print the pattern space by default. And the p flag to the s command tells sed to print the pattern space (so the result of the substitution) if the substitution is successful.

So the:

sed -n 's/pattern/replacement/p'

is a common idiom to print the result of successful substitutions.

Note that the above assumes the input is valid text. Since .* matches any sequence of characters, it won't match on sequences of bytes that don't form valid characters. That typically happens in UTF-8 locales when processing text in another encoding. If you're in such a case, you may want to prefix that line above with LC_ALL=C. That makes sed treat each byte as a character so there's no possible invalid byte sequence. That would work here as the characters we're matching on are all from the portable character set.

Standard awk doesn't have anything equivalent as it doesn't support capture groups (the \(...\) captured in \1) in it's sub() function.

There, you need to resort to the match() function:

awk 'match($0, / MATCH: [^ ]*/) {
       print substr($0, RSTART+8, RLENGTH-8)}'

Or use tricks like:

awk -F ' MATCH: ' 'NF>1 {sub(/ .*/, "", $2); print $2}'

(beware that those would consider the leftmost occurrence of " MATCH: ").

GNU awk has a gensub() function that has functionality similar to sed's s command, but a design mistake in that it doesn't tell you whether any substitution was done. Here, you could do:

 gawk '(replacement = gensub(/.* MATCH: ([^ ]*).*/, "\\1", 1)) != $0 {
   print replacement}'
1
  • Getting some different results using this way Commented Jan 23, 2017 at 10:47
2

Given the assumption that all your lines are formated the same (or at least all the lines containing MATCH:), it appears that MATCH: is the 5th element of the line, and the value you want is the 6th one.

Therefore in awk you just have to test if the 5th element is equal to MATCH: and print the 6th element of the line if correct.

$ echo "2017-01-19:31:51 [ABCD:] 37723 - MATCH: 10 [text]" |awk -e '{ if ($5 == "MATCH:") print $6 }' 
    10

EDIT: Given the assumption MATCH: can be anywhere in the line:

  $ echo "2017-01-19:31:51 [ABCD:] 37723 - MATCH: 10 [text]" |awk -e '{ for (x=1; x<NF; x++ ) { if ($x == "MATCH:") {x=x+1; printf("%s\n", $x); break}}}' 
10

Might not be very elegant, but you need to iter through all fields of the line and test each field, which is done with a for loop, and an if test. If test field is matching, then print the next field.

I just added a break to directly jump to the next line and and continue the current field iteration.

On a multi line file:

$ cat terst 
2017-01-19:31:51 [ABCD:] 37723 - MATCH: 10 [text]
2017-01-19:31:51 [ABCD:] 37723 - MATCH: 11 [text]
2017-01-19:31:51 [ABCD:] 37723 - [text]
2017-01-19:31:51 37723 - MATCH: 12 [text]
$ awk -e '{ for (x=1; x<NF; x++ ) { if ($x == "MATCH:") {x=x+1; printf("%s\n", $x); break}}}' terst 
10
11
12
2
  • Just to add MATCH: can be at any position so more generic approach like search string and print value? Commented Jan 23, 2017 at 10:39
  • Please edit your original post, to add this information. Might not be so obvious with the current text. Commented Jan 23, 2017 at 11:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.