3

I've never processed XMLs before, so I'm not sure how to process CDATA in within an XML file. I'm getting lost in nodes, parents, child nodes, nList, etc.

Can anyone tell me what my problem is from these snippets of code?

My getTagValue() method works on all tags except "Details", which is the one that contains CDATA.

.....
NodeList nList = doc.getElementsByTagName("Assignment");
for (int temp = 0; temp < nList.getLength(); temp++) {
    Node nNode = nList.item(temp);
    if (nNode.getNodeType() == Node.ELEMENT_NODE) {
        Element eElement = (Element) nNode;
        results = ("Class : " + getTagValue("ClassName", eElement)) + 
                  ("Period : " + getTagValue("Period", eElement)) +
                  ("Assignment : " + getTagValue("Details", eElement));
        myAssignments.add(results);
    }
}
.....
private String getTagValue(String sTag, Element eElement) {
    NodeList nlList = eElement.getElementsByTagName(sTag).item(0).getChildNodes();

    Node nValue = (Node) nlList.item(0);
    if((CharacterData)nValue instanceof CharacterData)
    {
        return ((CharacterData) nValue).getData();
    }
    return nValue.getNodeValue();
}
1
  • Aside from Bogdan's excellent explanation, if you can use Xom, Dom4J, etc, you'll probably be better for it. Commented Apr 7, 2012 at 20:01

1 Answer 1

5

I'm suspecting that your problem is in the following line of code from the getTagValue method:

Node nValue = (Node) nlList.item(0);

You are always getting the first child! But you might have more than one.

The following example has 3 children: text node "detail ", CDATA node "with cdata" and text node " here":

<Details>detail <![CDATA[with cdata]]> here</Details>

If you run your code, you get only "detail ", you loose the rest.

The following example has 1 child: a CDATA node "detail with cdata here":

<Details><![CDATA[detail with cdata here]]></Details>

If you run your code, you get everything.

But the same example as above written this way:

<Details>
   <![CDATA[detail with cdata here]]>
</Details>

now has 3 children because the spaces and line feeds are picked up as text nodes. If you run your code you get the first empty text node with a line feed, you loose the rest.

You either have to loop through all children (no matter how many) and concatenate the value of each to get the full result, or if it's not important for you to differentiate between plain text and text inside CDATA, then set the coalescing property on the document builder factory first:

DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setCoalescing(true);
...

Coalescing specifies that the parser produced by this code will convert CDATA nodes to Text nodes and append it to the adjacent (if any) text node. By default the value of this is set to false.

Sign up to request clarification or add additional context in comments.

1 Comment

was just looking for same in js, so element.childNodes[0].nodeValue instead of element.nodeValue did a trick for me, thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.