1

I'm writing a unix shell script where I need to pretty print XML files, but the catch is that there are portions of them that I may not touch. Namely, they're Apache Jelly scripts, which are contained within the XML files I need to pretty print. So I need to convert this

<proc source="customer"><scriptParam value="_user"/><scriptText><jelly:script>

  <jelly:log level="info">
    this text needs
      to keep its indent level
        and this is none of my business
  </jelly:log>

  <!-- get date -->
  <sql:query var="rs"><![CDATA[
    select sysdate
    from dual
  ]]></sql:query>

</jelly:script>
</scriptText></proc>

Into this

<proc source="customer">
  <scriptParam value="_user"/>
  <scriptText>
<jelly:script>

  <jelly:log level="info">
    this text needs
      to keep its indent level
        and this is none of my business
  </jelly:log>

  <!-- get date -->
  <sql:query var="rs"><![CDATA[
    select sysdate
    from dual
  ]]></sql:query>

</jelly:script>
  </scriptText>
</proc>

Notice that the only change to the jelly:script element is newline before it.

I couldn't find any option in xmllint or xmlstarlet to ignore a certain element. Is there any tool that can help me achieve this? I'm on Linux, if it matters.

1
  • "but the catch is that there are portions of them that I may not touch." - I think this disqualifies xmlstarlet, xmllint, and probably most XML parser based tools. Otherwise I would have suggested xmlstarlet ed. Commented Nov 6, 2015 at 14:57

1 Answer 1

1

When requirement is that inside element jelly:script no spaces may change, then you can use xml_pp (on linux installed with the perl package perl-XML-Twig. The option -p some-element can be used to preserve all whitespace inside those elements:

xml_pp -p jelly:script  thefile.xml

That will create this:

<proc source="customer">
  <scriptParam value="_user"/>
  <scriptText>
    <jelly:script>

  <jelly:log level="info">
    this text needs
      to keep its indent level
        and this is none of my business
  </jelly:log>

  <!-- get date -->
  <sql:query var="rs"><![CDATA[
    select sysdate
    from dual
  ]]></sql:query>

</jelly:script>
  </scriptText>
</proc>

As you can see the start element <jelly:script> is also indented, because added spaces are still outside the element.

If that is also forbidden, then you must choose one level higher (scriptText), or maybe pipe it to a command that remove those spaces again:

xml_pp -p jelly:script thefile.xml | perl -pe 's/^\s*(<jelly:script>)/$1/'
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.