If you really want sed- or awk-like command-line processing for XML files then you should probably consider using an XML-processing command-line tool. Here are some of the tools that I've seen more commonly used:
Note that (in order to be valid XML) your XML data needs a root node and that your attribute values should be quoted, i.e. your data file should look more like this:
<!-- data.xml -->
<instances>
<instance ab='1'>
<a1>aa</a1>
<a2>aa</a2>
</instance>
<instance ab='2'>
<b1>bb</b1>
<b2>bb</b2>
</instance>
<instance ab='3'>
<c1>cc</c1>
<c2>cc</c2>
</instance>
</instances>
Now you can use XPath with xmlstarlet to get exactly what you want:
xmlstarlet sel -t -m '//instance' -c "./*" -n data.xml
This produces the following output:
<a1>aa</a1><a2>aa</a2>
<b1>bb</b1><b2>bb</b2>
<c1>cc</c1><c2>cc</c2>
If you're not concerned with having properly formatted XML and you just want to parse a text file that looks roughly like the one you've presented, then you can definitely accomplish what you want just using shell-scripting and standard command-line tools. Here is a Bash script that produces the desired output:
#!/bin/bash
# extract_instance_children.bash
# Keep track of whether or not we're inside of an "instance" element
instance=0
# Loop through the lines of the file
while read line; do
# Set the instance flag to true if we come across an opening tag
if echo "${line}" | grep -q '<instance.*>'; then
instance=1
# Set the instance flag to false and print a newline if we come across a closing tag
elif echo "${line}" | grep -q '</instance>'; then
instance=0
echo
# If we're inside an instance tag then print the child element
elif [[ ${instance} == 1 ]]; then
printf "${line}"
fi
done < "${1}"
You would execute it like this:
bash extract_instance_children.bash data.xml
Or, to use my favorite option, you could use Python. Here is a Python script that accomplishes the same task:
#!/usr/bin/env python2
# -*- encoding: ascii -*-
# extract_instance_children.py
import sys
import xml.etree.ElementTree
# Load the data
tree = xml.etree.ElementTree.parse(sys.argv[1])
root = tree.getroot()
# Update the XML tree
for instance in root.iter("instance"):
print(''.join([xml.etree.ElementTree.tostring(child).strip() for child in instance]))
And here is how you could run the script:
python extract_instance_children.py data.xml