Revisions to Extract the Children of a Specific XML Element Type

added 1 character in body

Source Link

edited Mar 30, 2018 at 19:19

igal

10.2k
4
45
60

This uses the xml package from the Python StandarStandard Library which is also a strict XML parser.

added 1010 characters in body

Source Link

edited Nov 12, 2017 at 0:04

igal

10.2k
4
45
60

If you're not concerned with having properly formatted XML and you just want to parse a text file that looks roughly like the one you've presented, then you can definitely accomplish what you want just using shell-scripting and standard command-line tools. Here is an awk script (as requested):

#!/usr/bin/env awk

# extract_instance_children.awk

BEGIN {
    addchild=0;
    children="";
}

{
    # Opening tag for "instance" element - set the "addchild" flag
    if($0 ~ "^ *<instance[^<>]+>") {
        addchild=1;
    }

    # Closing tag for "instance" element - reset "children" string and "addchild" flag, print children
    else if($0 ~ "^ *</instance>" && addchild == 1) {
        addchild=0;
        printf("%s\n", children);
        children="";
    }

    # Concatenating child elements - strip whitespace
    else if (addchild == 1) {
        gsub(/^[ \t]+/,"",$0);
        gsub(/[ \t]+$/,"",$0);
        children=children $0;
    }
}

To execute the script from a file, you would use a command like this one:

awk -f extract_instance_children.awk data.xml

And here is a Bash script that produces the desired output:

If you're not concerned with having properly formatted XML and you just want to parse a text file that looks roughly like the one you've presented, then you can definitely accomplish what you want just using shell-scripting and standard command-line tools. Here is a Bash script that produces the desired output:

If you're not concerned with having properly formatted XML and you just want to parse a text file that looks roughly like the one you've presented, then you can definitely accomplish what you want just using shell-scripting and standard command-line tools. Here is an awk script (as requested):

#!/usr/bin/env awk

# extract_instance_children.awk

BEGIN {
    addchild=0;
    children="";
}

{
    # Opening tag for "instance" element - set the "addchild" flag
    if($0 ~ "^ *<instance[^<>]+>") {
        addchild=1;
    }

    # Closing tag for "instance" element - reset "children" string and "addchild" flag, print children
    else if($0 ~ "^ *</instance>" && addchild == 1) {
        addchild=0;
        printf("%s\n", children);
        children="";
    }

    # Concatenating child elements - strip whitespace
    else if (addchild == 1) {
        gsub(/^[ \t]+/,"",$0);
        gsub(/[ \t]+$/,"",$0);
        children=children $0;
    }
}

To execute the script from a file, you would use a command like this one:

awk -f extract_instance_children.awk data.xml

And here is a Bash script that produces the desired output:

added 239 characters in body

Source Link

edited Nov 11, 2017 at 19:10

igal

10.2k
4
45
60

You should also be aware that there are several XML-specific programming/query languages:

XPath

XQuery

XSLT

Note that (in order to be valid XML) your XML data needs a root node and that your attribute values should be quoted, i.e. your data file should look more like this:

added 1057 characters in body

Source Link

edited Nov 11, 2017 at 18:53

igal

10.2k
4
45
60

Loading

added 408 characters in body

Source Link

edited Nov 11, 2017 at 15:46

igal

10.2k
4
45
60

Loading

added 689 characters in body

Source Link

edited Nov 11, 2017 at 15:39

igal

10.2k
4
45
60

Loading

added 1211 characters in body; added 31 characters in body

Source Link

edited Nov 11, 2017 at 15:24

igal

10.2k
4
45
60

Loading

Source Link

answered Nov 11, 2017 at 15:07

igal

10.2k
4
45
60

Loading

Stack Exchange Network

Return to Answer