Skip to main content
2 of 8
added 1211 characters in body; added 31 characters in body
igal
  • 10.2k
  • 4
  • 45
  • 60

Note that (in order to be valid XML) your XML data needs a root node and that your attribute values should be quoted, i.e. your data file should look more like this:

<!-- data.xml -->

<instances>

    <instance ab='1'>
        <a1>aa</a1>
        <a2>aa</a2>
    </instance>

    <instance ab='2'>
        <b1>bb</b1>
        <b2>bb</b2>
    </instance>

    <instance ab='3'>
        <c1>cc</c1>
        <c2>cc</c2>
    </instance>

</instances>

Now you can use XPath with xmlstarlet to get exactly what you want:

xmlstarlet sel -t -m '//instance' -c "./*" -n data.xml

This produces the following output:

<a1>aa</a1><a2>aa</a2>
<b1>bb</b1><b2>bb</b2>
<c1>cc</c1><c2>cc</c2>

If you're not concerned with having properly formatted XML and you just want to parse a text file that looks roughly like the one you've presented, then you can definitely accomplish what you want just using shell-scripting and standard command-line tools. Here is a Bash script that produces the desired output:

#!/bin/bash

# extract_instance_elements.bash

# Keep track of whether or not we're inside of an "instance" element
instance=0

# Loop through the lines of the file
while read line; do

    # Set the instance flag to true if we come across an opening tag
    if echo "${line}" | grep -q '<instance.*>'; then
        instance=1

    # Set the instance flag to false and print a newline if we come across a closing tag
    elif echo "${line}" | grep -q '</instance>'; then
        instance=0
        echo

    # If we're inside an instance tag then print the child element
    elif [[ ${instance} == 1 ]]; then
        printf "${line}"
    fi

done < "${1}"

You would execute it like this:

bash extract_instance_elements.sh data.xml 
igal
  • 10.2k
  • 4
  • 45
  • 60