Skip to main content
added 1 character in body
Source Link
igal
  • 10.2k
  • 4
  • 45
  • 60

This uses the xml package from the Python StandarStandard Library which is also a strict XML parser.

This uses the xml package from the Python Standar Library which is also a strict XML parser.

This uses the xml package from the Python Standard Library which is also a strict XML parser.

added 1010 characters in body
Source Link
igal
  • 10.2k
  • 4
  • 45
  • 60

If you're not concerned with having properly formatted XML and you just want to parse a text file that looks roughly like the one you've presented, then you can definitely accomplish what you want just using shell-scripting and standard command-line tools. Here is an awk script (as requested):

#!/usr/bin/env awk

# extract_instance_children.awk

BEGIN {
    addchild=0;
    children="";
}

{
    # Opening tag for "instance" element - set the "addchild" flag
    if($0 ~ "^ *<instance[^<>]+>") {
        addchild=1;
    }

    # Closing tag for "instance" element - reset "children" string and "addchild" flag, print children
    else if($0 ~ "^ *</instance>" && addchild == 1) {
        addchild=0;
        printf("%s\n", children);
        children="";
    }

    # Concatenating child elements - strip whitespace
    else if (addchild == 1) {
        gsub(/^[ \t]+/,"",$0);
        gsub(/[ \t]+$/,"",$0);
        children=children $0;
    }
}

To execute the script from a file, you would use a command like this one:

awk -f extract_instance_children.awk data.xml

And here is a Bash script that produces the desired output:

If you're not concerned with having properly formatted XML and you just want to parse a text file that looks roughly like the one you've presented, then you can definitely accomplish what you want just using shell-scripting and standard command-line tools. Here is a Bash script that produces the desired output:

If you're not concerned with having properly formatted XML and you just want to parse a text file that looks roughly like the one you've presented, then you can definitely accomplish what you want just using shell-scripting and standard command-line tools. Here is an awk script (as requested):

#!/usr/bin/env awk

# extract_instance_children.awk

BEGIN {
    addchild=0;
    children="";
}

{
    # Opening tag for "instance" element - set the "addchild" flag
    if($0 ~ "^ *<instance[^<>]+>") {
        addchild=1;
    }

    # Closing tag for "instance" element - reset "children" string and "addchild" flag, print children
    else if($0 ~ "^ *</instance>" && addchild == 1) {
        addchild=0;
        printf("%s\n", children);
        children="";
    }

    # Concatenating child elements - strip whitespace
    else if (addchild == 1) {
        gsub(/^[ \t]+/,"",$0);
        gsub(/[ \t]+$/,"",$0);
        children=children $0;
    }
}

To execute the script from a file, you would use a command like this one:

awk -f extract_instance_children.awk data.xml

And here is a Bash script that produces the desired output:

added 239 characters in body
Source Link
igal
  • 10.2k
  • 4
  • 45
  • 60

You should also be aware that there are several XML-specific programming/query languages:

Note that (in order to be valid XML) your XML data needs a root node and that your attribute values should be quoted, i.e. your data file should look more like this:

Note that (in order to be valid XML) your XML data needs a root node and that your attribute values should be quoted, i.e. your data file should look more like this:

You should also be aware that there are several XML-specific programming/query languages:

Note that (in order to be valid XML) your XML data needs a root node and that your attribute values should be quoted, i.e. your data file should look more like this:

added 1057 characters in body
Source Link
igal
  • 10.2k
  • 4
  • 45
  • 60
Loading
added 408 characters in body
Source Link
igal
  • 10.2k
  • 4
  • 45
  • 60
Loading
added 689 characters in body
Source Link
igal
  • 10.2k
  • 4
  • 45
  • 60
Loading
added 1211 characters in body; added 31 characters in body
Source Link
igal
  • 10.2k
  • 4
  • 45
  • 60
Loading
Source Link
igal
  • 10.2k
  • 4
  • 45
  • 60
Loading