Parsing XML file with DOM (Java)

Question

I want to parse the following url: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=224589801

As a result I came up with the following method:

public void parseXml2(String URL) {
    DOMParser parser = new DOMParser();

    try {
        parser.parse(new InputSource(new URL(URL).openStream()));
        Document doc = parser.getDocument();

        NodeList nodeList = doc.getElementsByTagName("Item");
        for (int i = 0; i < nodeList.getLength(); i++) {
            Node n = nodeList.item(i);
            Node actualNode = n.getFirstChild();
            if (actualNode != null) {
                System.out.println(actualNode.getNodeValue());
            }
        }

    } catch (SAXException ex) {
        Logger.getLogger(TaxMapperXml.class.getName()).log(Level.SEVERE, null, ex);
    } catch (IOException ex) {
        Logger.getLogger(TaxMapperXml.class.getName()).log(Level.SEVERE, null, ex);
    }
}

With this method I can take the values of the Item nodes but I can't take any of their attributes. I tried experimenting with getAttribute() with NamedNodeMap but still to no avail.

Why do I have to do n.getFirstChild().getNodeValue(); to get the actual value? n.getNodeValue() returns just null? Isn't this counter-intuitive - obviously in my case node's doesn't have subnodes?
Is there some more robust and widely accepted way of parsing XML files using DOM? My files aren't gonna be big 15-20 lines at most, so SAX isn't necessary (or is it?)

You may write a simple helper class to accomplish your task around DOM. See this stackoverflow.com/a/8346867/851432 — Jomoos
– Jomoos, Commented Dec 1, 2011 at 19:21

gigadot · Accepted Answer · 2011-10-26 12:04:39Z

6

Text value that is surrounded by XML tag are also considered as Node in DOM. That's why you have to get the text Node before getting the value. If you try to count the number of node in an <Item>, you will see that whenever there is a text, there is a node.
XOM has more intuitive interface but it doesn't have org.w3c.dom.* interface.

If you want to use the build-in parser, you should look at http://www.java-samples.com/showtutorial.php?tutorialid=152

The DOMParser you tried to use are propriety and it's not portable.

answered Oct 26, 2011 at 12:04

gigadot

8,9697 gold badges39 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Wivani · Accepted Answer · 2011-10-26 11:49:06Z

import java.io.IOException;
import java.net.URL;
import org.apache.xerces.parsers.DOMParser;

import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

public class XMLParser {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        parseXml2("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=224589801");
    }

    public static void parseXml2(String URL) {
        DOMParser parser = new DOMParser();

        try {
            parser.parse(new InputSource(new URL(URL).openStream()));
            Document doc = parser.getDocument();

            NodeList nodeList = doc.getElementsByTagName("Item");
            for (int i = 0; i < nodeList.getLength(); i++) {
                System.out.print("Item "+(i+1));
                Node n = nodeList.item(i);
                NamedNodeMap m = n.getAttributes();
                System.out.print(" Name: "+m.getNamedItem("Name").getTextContent());
                System.out.print(" Type: "+m.getNamedItem("Type").getTextContent());
                Node actualNode = n.getFirstChild();
                if (actualNode != null) {
                    System.out.println(" "+actualNode.getNodeValue());
                } else {
                    System.out.println(" ");                    
                }
            }

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}

Completed the sample code and added a few lines to get the attributes.

This should get you started, although I feel that you need to get yourself up to date with the basic notions of DOM. This site (and many others) can help you with that. Most importantly is understanding the different kinds of nodes there are.

Maurice Perry · Accepted Answer · 2011-10-26 12:04:54Z

1

Text inside xml elements are in text nodes because subelements can be mixed with text. For instance:

...
<A>blah<B/>blah</A>
...

Element A has three children: a text node, element B, another text node.

answered Oct 26, 2011 at 12:04

Maurice Perry

32.8k9 gold badges72 silver badges97 bronze badges

Collectives™ on Stack Overflow

Parsing XML file with DOM (Java)

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related