1
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<extendedinfo type="html">
    <![CDATA[<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>]]>
</extendedinfo>
<extendedinfo type="html">
    <![CDATA[<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>]]>
</extendedinfo>

I have a .XML file generated by a tool in the above format, with html data within CDATA sections. Which parser or in what way can I retrieve the html data from the XMLfile using java?

2
  • Take a look to softwarecave.org/2014/02/18/… next time provide all xml structure then we could write code for you. Commented Mar 15, 2017 at 6:35
  • @SauliusNext the file is similar with lot of table data which goes on for 1k lines, so thought this might be enough Commented Mar 15, 2017 at 8:29

1 Answer 1

1

Just access the CDATA as text content

Variant 1 (DOM):

    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.InputStream;
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import org.w3c.dom.Document;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;

public void getCDATAFromHardcodedPathWithDom() {
    String yourSampleFile = "/path/toYour/sample/file.xml";
    String cdataNode = "extendedinfo";
    try (InputStream in =
            new BufferedInputStream(new FileInputStream(yourSampleFile))) {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(in);
        NodeList elements = doc.getElementsByTagName(cdataNode);
        for (int i = 0; i < elements.getLength(); i++) {
            Node e = elements.item(i);
            System.out.println(e.getTextContent());
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Variant 2 (stax):

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public void getCDATAFromHardcodedPathWithStax() {
    String yourSampleFile = "/path/toYour/sample/file.xml";
    String cdataNode = "extendedinfo";
    XMLStreamReader r = null;
    try (InputStream in =
            new BufferedInputStream(new FileInputStream(yourSampleFile));)        {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        r = factory.createXMLStreamReader(in);
        while (r.hasNext()) {
            switch (r.getEventType()) {
            case XMLStreamConstants.START_ELEMENT:
                if (cdataNode.equals(r.getName().getLocalPart())) {
                    System.out.println(r.getElementText());
                }
                break;
            default:
                break;
            }
            r.next();
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    } finally {
        if (r != null) {
            try {
                r.close();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }
}

With /path/toYour/sample/file.xml

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<root>
<extendedinfo type="html">
    <![CDATA[<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>]]>
</extendedinfo>
<extendedinfo type="html">
    <![CDATA[<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>]]>
</extendedinfo>
</root>

It will give you

<table class="ResultTable" cellpadding=2 cellspacing=1 border=0><tr class="TableHeadingLine"><th bgcolor="#b3b3b3" align="left" colspan="6"><font face="arial, verdana, trebuchet, officina, sans-serif" size="+2"><B>Testcase: Init Testreport</B></font></th></tr><tr class="TableHeadingLine"><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="120px"></th><th class="TableHeadingCell" width="80px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="345px"></th><th class="TableHeadingCell" width="70px"></th></tr>


<tr><td class="DefineCell">58.675124</td><td class="DefaultCell" colspan="5"><i><font color="#008000">Set_Temperature is set to 23</font></i><br>Set_Temperature = 23</td></tr>

An interesting alternative using JAXB is given here:

Retrieve value from CDATA

An example on how to extract just all CDATA is given here:

Unable to check CDATA in XML using XMLEventReader in Stax

Sign up to request clarification or add additional context in comments.

2 Comments

Change to stax it is more suitable for big file reading and this is the answer :)
I added a second variant with stax, though question was about a small file with 1k lines of xml. One of my first answers in SO, hope it is ok to add code to accepted answer?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.