-1

In linux, how can we use grep command to print the contents that comes inside this tag?

<errorPayload>XXXXXXXX</errorPayload>

I tried grep -Po '<errorPayload>' abc.log, but it only prints <errorPayload>

2
  • 3
    Please edit your question and add more complete examples of your input and expected output. Can XXXXXXXX contain < or >? Can XXXXXXXX be multi-line? What if you have <errorPayload>AAAA <errorPayload>AAAA </errorPayload> </errorPayload>? Is that possible? Why do you need to do this with grep? Are you open to other tools? Can there be many errorPayload tags in the file or just one? Many on the same line or just one? Commented Nov 25, 2024 at 13:37
  • 1
    This is XML, right? You'll get a more specific answer if you can provide some of the structure that surrounds this element Commented Nov 25, 2024 at 18:05

2 Answers 2

2

Don't use grep to parse XML or HTML.

Instead, use a proper parser:

xidel -s -e '//errorPayload/text()' file
XXXXXXXX

Can work with xmllint and xmlstarlet as well too, in some less cases (weak HTML support):

xmlstarlet sel -t -v '//errorPayload/text()' file
xmllint --xpath '//errorPayload/text()' file
1
  • +1 because, as I said, grep is a rotten tool for parsing XML :) Commented Nov 25, 2024 at 15:54
-1

It only prints <errorPayload> because that's what you told it to do by using the -o (--only-matching) option. From the man page, that means "Print only the matched (non-empty) parts of a matching line..."

If you want to see just the content of the tag, you need to create a regular expression that matches only the content, but not the start/end tag.

This should do it:

grep -Po '(?<=<errorPayload>).*(?=</errorPayload>)' abc.log

Given you sample input in abc.log, this produces:

XXXXXXXX

The expression (?<=<errorPayload>) is a "positive look-behind assertion": it means that the given pattern needs to match before our target expression, but is not considered part of the "matched content". The expression (?=</errorPayload>) is a "positive look-ahead assertion", which does the same thing but for a following pattern.

See e.g. this article for more details about look-ahead and look-behind assertions.


Caveat: grep is a rotten tool for parsing XML. The above will work as long as the XML formatting in your log files is consistent.

3
  • 1
    If there are more than one sets of <errorPayload>...</errorPayload>` on the same line, this will print the string found between the first opening <errorPayload> and the last closing </errorPayload>. Commented Nov 25, 2024 at 14:59
  • True! But it does work for the sample input provided. Commented Nov 25, 2024 at 15:54
  • Oh yes indeed. The comment was more for anyone who happens to read the answer than for you. I suspected you were well aware of that specific limitation ;) Commented Nov 25, 2024 at 16:43

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.