0

I need help extracting an XML string from file like this:

<line>
<Start_Time>2016-May-18 17.06.17.504</Start_Time>
<Domain>pciereg062</Domain>
<Injected_tags>
 before xml started ; AUTOMATIC-REPRODUCTION-stopped on barrier ;
</Injected_tags>
</line>

<line>
<Start_Time>2016-May-18 17.08.53.585</Start_Time>
<Domain>adv191</Domain>
<Injected_tags>port-num-0 ; port-num-0 actual-FW-14.16.0234 ;
</Injected_tags>
</line>

I want to extract the domain name which is in injected_tags (which will come always after domain) string stopped on barrier.

Is there a simple bash utility to do this (grep, awk, sed)?

From the example above, the output should be pciereg062 and not adv191.

3
  • 1
    Use an XML/HTML parser (xmllint, xmlstarlet ...). Commented May 22, 2016 at 21:33
  • 1
    See: Using xmlstarlet, how do I change the value of an element Commented May 22, 2016 at 21:35
  • While it's quick, dirty, bad and shooting-in-your-leg solution, it should work as long as your XML input structure remains the same: grep -B 2 'stopped on barrier' input.xml | grep -Po '(?<=<Domain>).*(?=</Domain>)'. You really should look into some XML parser like Cyrus suggested. Commented May 22, 2016 at 21:40

1 Answer 1

1

With GNU awk for multi-char RS:

$ awk -v RS='</[^>]+>' -F'[<>]' '{m[$2]=$3} $2=="Injected_tags" && /stopped on barrier/{print m["Domain"]}' file
pciereg062
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.