If you're not concerned with having properly formatted XML and you just want to parse a text file that looks roughly like the one you've presented, then you can definitely accomplish what you want just using shell-scripting and standard command-line tools. Here is an awk script (as requested):
#!/usr/bin/env awk
# extract_instance_children.awk
BEGIN {
addchild=0;
children="";
}
{
# Opening tag for "instance" element - set the "addchild" flag
if($0 ~ "^ *<instance[^<>]+>") {
addchild=1;
}
# Closing tag for "instance" element - reset "children" string and "addchild" flag, print children
else if($0 ~ "^ *</instance>" && addchild == 1) {
addchild=0;
printf("%s\n", children);
children="";
}
# Concatenating child elements - strip whitespace
else if (addchild == 1) {
gsub(/^[ \t]+/,"",$0);
gsub(/[ \t]+$/,"",$0);
children=children $0;
}
}
To execute the script from a file, you would use a command like this one:
awk -f extract_instance_children.awk data.xml
And here is a Bash script that produces the desired output: