0

I have a log file which has XMLs being logged. I need to search and extract all XML's that have a specific string in the any one of the nodes.

e.g. the log file will have mulitple xml's containing the search param.

randomlogentry1
randomlogentry2
Printing XML:<CreateDataABC>
    <Tag1>searchparam</Tag1>
</CreateDataABC>
randomlogentry3
randomlogentry4
randomlogentry5
Printing XML: <DataCreatedABC>
       <TagA>otherparam</TagA>
       <TagB>searchparam</TagB>
       <TagC>otherparam</TagC>
    </DataCreatedABC>
randomlogentry6
randomlogentry7

The expected output is the two XML's printed on console or written to seperate files.

XML1:

<CreateDataABC>
     <Tag1>searchparam</Tag1>
</CreateDataABC>

XML2:

<DataCreatedABC>
     <TagA>otherparam</TagA>
     <TagB>searchparam</TagB>
     <TagC>otherparam</TagC>
</DataCreatedABC>

The position of 'searchparam' in a XML is never fixed and the only constants are the 'ABC' string and the 'searchparam'.

I thought to use sed to extract between 2 line numbers for which I tried the following:

  1. Search for the searchparam and identify line no.
  2. Find the next occurence of ABC and get the line number,

I somehow cant seem to be able to find the previous occurence of ABC from a specific line!!

Has anyone done this before?

EDIT: Updated the example log format and expected output.

3
  • 2
    extend your content to show a surrounded parts of the search xml fragment Commented May 25, 2018 at 8:31
  • Is the log file a well-formed XML file? Commented May 25, 2018 at 9:29
  • Log file is not an XML its text Commented May 25, 2018 at 9:35

2 Answers 2

0

Try this:

Max=`grep -c "^Printing" file.xml`

for count in `seq 1 $Max`
do
    sed -nr '/Printing/H;//,/ABC/G;s/\n(\n[^\n]*){'$count'}$//p'  file.xml | sed 's/Printing XML://' > $count.xml
done
6
  • Thanks, I have updated the query with more details, what I need to do is to extract the entire XML out of a text log file... Commented May 25, 2018 at 9:35
  • plz, share the expected output as like I did in my answer. Commented May 25, 2018 at 10:03
  • have updated the exact output expected Commented May 25, 2018 at 10:09
  • try my updated answer Commented May 25, 2018 at 10:58
  • Thanks Siva, as I mentioned, the only constants are 'ABC' and 'searchparam' so I cannot depend on the presence of 'Printing'. Commented May 25, 2018 at 11:28
0

Here is what I wrote, but I am sure there is a shorter and more elegant way of doing this.

searchstring=searchparam
filename=test.log
pattern1=ABC

linenums=($(grep -n "${searchstring}" ${filename} | awk -F":" '{print $1}'))
len=${#linenums[@]}

for (( i=0; i<${len}; i++ ));
do
  currentline=${linenums[$i]}
  relativeendlinearray=($(tail -n +${currentline} ${filename} | grep -n "${pattern1}" | awk -F":" '{print $1}'))
  actualendline=$(($currentline+${relativeendlinearray[0]}-1))

  index=$currentline
  while [ $index -ne 0 ]
        do
        found=`sed "${index}q;d" ${filename} | grep "${pattern1}"`
        if [ -n "$found" ]; then
            actualstartline=$index
            break;
        fi
        index=$[$index-1]
  done

  if [ -n "$found" ]; then
        echo ""
  else
        echo "Log break detected, content across multiple files"
  fi

  echo "Start Line" ${actualstartline}
  echo "Current Line" ${currentline}
  echo "End Line" ${actualendline}
  sed -n "${actualstartline},${actualendline}p" ${filename}
done

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.