15

I'm trying to parse CDATA tpyes in XML. The code runs fine and it will print Links: in the console (about 50 times, because that's how many links I have) but the links won't appear...it's just a blank console space. What could I be missing?``

package Parse;

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.CharacterData;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class XMLParse {
  public static void main(String[] args) throws Exception {
    File file = new File("c:test/returnfeed.xml");
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    Document doc = builder.parse(file);

    NodeList nodes = doc.getElementsByTagName("video");
    for (int i = 0; i < nodes.getLength(); i++) {
      Element element = (Element) nodes.item(i);
      NodeList title = element.getElementsByTagName("videoURL");
      Element line = (Element) title.item(0);
      System.out.println("Links: " + getCharacterDataFromElement(line));
    }
  }
  public static String getCharacterDataFromElement(Element e) {
    Node child = e.getFirstChild();
    if (child instanceof CharacterData) {
      CharacterData cd = (CharacterData) child;
      return cd.getData();
    }
    return "";
  }
}

Result:

Links: 

Links: 

Links: 

Links: 

Links: 

Links: 

Links: 

Sample XML: (Not full document)

<?xml version="1.0" ?> 
<response xmlns:uma="http://websiteremoved.com/" version="1.0">

    <timestamp>
        <![CDATA[  July 18, 2012 5:52:33 PM PDT 
          ]]> 
    </timestamp>
    <resultsOffset>
        <![CDATA[  0 
          ]]> 
    </resultsOffset>
    <status>
        <![CDATA[  success 
        ]]> 
    </status>
    <resultsLimit>
        <![CDATA[  207 
        ]]> 
    </resultsLimit>
    <resultsCount>
        <![CDATA[  207 
        ]]> 
    </resultsCount>
    <videoCollection>
        <name>
            <![CDATA[  Video API 
            ]]> 
        </name>
        <count>
            <![CDATA[  207 
            ]]> 
        </count>
        <description>
            <![CDATA[  
            ]]> 
        </description>
        <videos>
            <video>
                <id>
                    <![CDATA[  8177840 
                    ]]> 
                </id>
                <headline>
                    <![CDATA[  Test1
                    ]]> 
                </headline>
                <shortHeadline>
                    <![CDATA[  Test2
                    ]]> 
                </shortHeadline>
                <description>
                    <![CDATA[ Test3

                    ]]> 
                </description>
                <shortDescription>
                    <![CDATA[ Test4

                    ]]> 
                </shortDescription>
                <posterImage>
                    <![CDATA[ http://a.com.com/media/motion/2012/0718/los_120718_los_bucher_on_howard.jpg

                    ]]> 
                </posterImage>
                <videoURL>
                    <![CDATA[ http://com/removed/2012/0718/los_120718_los_bucher_on_howard.mp4

                    ]]> 
                </videoURL>
            </video>
        </videos>
    </videoCollection>
</response>
5
  • could you provide a sample xml? or a part thereof? Commented Jul 19, 2012 at 3:58
  • XML Added. I'm trying to get the http URL's in the "videoURL" tag. Commented Jul 19, 2012 at 4:10
  • Are you sure that you have only one child node 'Node child = e.getFirstChild();' ? Get all child nodes and inspect them in debugger. Commented Jul 19, 2012 at 5:28
  • have you checked the xml that you've posted? you've missed the end tags. and i agree with @RafaelOsipov -i think its that there's only one child for every node. Commented Jul 19, 2012 at 13:17
  • have you tried the solution that I provided? i was hoping this would solve your issue :) Commented Jul 19, 2012 at 21:56

2 Answers 2

20

Instead of checking the first child, it would be prudent whether the node has other children as well. In your case (and I guess if you had debugged that node, you would've known), the node passed to the method getCharacterDataFromElement had multiple children. I updated the code and this one might give you the pointers to the right direction:

public static String getCharacterDataFromElement(Element e) {

    NodeList list = e.getChildNodes();
    String data;

    for(int index = 0; index < list.getLength(); index++){
        if(list.item(index) instanceof CharacterData){
            CharacterData child = (CharacterData) list.item(index);
            data = child.getData();

            if(data != null && data.trim().length() > 0)
                return child.getData();
        }
    }
    return "";
}
Sign up to request clarification or add additional context in comments.

1 Comment

Calling setCoalescing(true) on your DocumentBuilderFactory would make sure there are no separate DOM nodes created for whitespace, as described in stackoverflow.com/questions/8045716/….
2

I would consider using getTextContent()

String string = cdataNode.getTextContent();

1 Comment

This solution doesn't require the use of any cast nor the invocation of any specific method.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.