grep or awk to extact xml from log based on search string

Question

I have a log file which has XMLs being logged. I need to search and extract all XML's that have a specific string in the any one of the nodes.

e.g. the log file will have mulitple xml's containing the search param.

randomlogentry1
randomlogentry2
Printing XML:<CreateDataABC>
    <Tag1>searchparam</Tag1>
</CreateDataABC>
randomlogentry3
randomlogentry4
randomlogentry5
Printing XML: <DataCreatedABC>
       <TagA>otherparam</TagA>
       <TagB>searchparam</TagB>
       <TagC>otherparam</TagC>
    </DataCreatedABC>
randomlogentry6
randomlogentry7

The expected output is the two XML's printed on console or written to seperate files.

XML1:

<CreateDataABC>
     <Tag1>searchparam</Tag1>
</CreateDataABC>

XML2:

<DataCreatedABC>
     <TagA>otherparam</TagA>
     <TagB>searchparam</TagB>
     <TagC>otherparam</TagC>
</DataCreatedABC>

The position of 'searchparam' in a XML is never fixed and the only constants are the 'ABC' string and the 'searchparam'.

I thought to use sed to extract between 2 line numbers for which I tried the following:

Search for the searchparam and identify line no.
Find the next occurence of ABC and get the line number,

I somehow cant seem to be able to find the previous occurence of ABC from a specific line!!

Has anyone done this before?

EDIT: Updated the example log format and expected output.

extend your content to show a surrounded parts of the search xml fragment — RomanPerekhrest
– RomanPerekhrest, Commented May 25, 2018 at 8:31

Siva · Accepted Answer · 2018-05-25 10:57:19Z

0

Try this:

Max=`grep -c "^Printing" file.xml`

for count in `seq 1 $Max`
do
    sed -nr '/Printing/H;//,/ABC/G;s/\n(\n[^\n]*){'$count'}$//p'  file.xml | sed 's/Printing XML://' > $count.xml
done

edited May 25, 2018 at 10:57

answered May 25, 2018 at 8:35

Siva

9,2529 gold badges60 silver badges88 bronze badges

Thanks, I have updated the query with more details, what I need to do is to extract the entire XML out of a text log file...

Saravanakumar Mohan
– Saravanakumar Mohan

2018-05-25 09:35:54 +00:00
Commented May 25, 2018 at 9:35
plz, share the expected output as like I did in my answer.

Siva
– Siva

2018-05-25 10:03:11 +00:00
Commented May 25, 2018 at 10:03
have updated the exact output expected

Saravanakumar Mohan
– Saravanakumar Mohan

2018-05-25 10:09:17 +00:00
Commented May 25, 2018 at 10:09
try my updated answer

Siva
– Siva

2018-05-25 10:58:44 +00:00
Commented May 25, 2018 at 10:58
Thanks Siva, as I mentioned, the only constants are 'ABC' and 'searchparam' so I cannot depend on the presence of 'Printing'.

Saravanakumar Mohan
– Saravanakumar Mohan

2018-05-25 11:28:37 +00:00
Commented May 25, 2018 at 11:28

| Show 1 more comment

Saravanakumar Mohan · Accepted Answer · 2018-05-29 11:44:13Z

Here is what I wrote, but I am sure there is a shorter and more elegant way of doing this.

searchstring=searchparam
filename=test.log
pattern1=ABC

linenums=($(grep -n "${searchstring}" ${filename} | awk -F":" '{print $1}'))
len=${#linenums[@]}

for (( i=0; i<${len}; i++ ));
do
  currentline=${linenums[$i]}
  relativeendlinearray=($(tail -n +${currentline} ${filename} | grep -n "${pattern1}" | awk -F":" '{print $1}'))
  actualendline=$(($currentline+${relativeendlinearray[0]}-1))

  index=$currentline
  while [ $index -ne 0 ]
        do
        found=`sed "${index}q;d" ${filename} | grep "${pattern1}"`
        if [ -n "$found" ]; then
            actualstartline=$index
            break;
        fi
        index=$[$index-1]
  done

  if [ -n "$found" ]; then
        echo ""
  else
        echo "Log break detected, content across multiple files"
  fi

  echo "Start Line" ${actualstartline}
  echo "Current Line" ${currentline}
  echo "End Line" ${actualendline}
  sed -n "${actualstartline},${actualendline}p" ${filename}
done

Stack Exchange Network

grep or awk to extact xml from log based on search string

2 Answers 2

You must log in to answer this question.

Hot Network Questions

grep or awk to extact xml from log based on search string

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions