7

This is how I can parse a well-formed XML document in Java:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();

// text contains the XML content
Document doc = builder.parse(new InputSource(new StringReader(text)));

An example for text is this:

<a>
  <b/>
</a>

How can I parse a DocumentFragment? For example, this:

<a>
  <b/>
</a>
<a>
  <b/>
</a>

NOTE: I want to use org.w3c.dom and no other libraries/technologies, if possible.

3 Answers 3

6

I just thought of a silly solution. I could wrap the fragment in a dummy element like this:

<dummy><a>
  <b/>
</a>
<a>
  <b/>
</a></dummy>

And then programmatically filter out that dummy element again, like this:

String wrapped = "<dummy>" + text + "</dummy>";
Document parsed = builder.parse(new InputSource(new StringReader(wrapped)));
DocumentFragment fragment = parsed.createDocumentFragment();

// Here, the document element is the <dummy/> element.
NodeList children = parsed.getDocumentElement().getChildNodes();

// Move dummy's children over to the document fragment
while (children.getLength() > 0) {
    fragment.appendChild(children.item(0));
}

But that's a bit lame, let's see if there is any other solution.

Sign up to request clarification or add additional context in comments.

5 Comments

Exactly what I was going to suggest - you beat me to it.
XML parsers in other platforms support a DocumentFragment so you needn't add a hack
@Phlip: What are those "other platforms", and how would that have helped me back when I asked / answered this question?
Gnome's libxml2 (which Python and Ruby use) permit fragments. But I admit I'm not helping you so much as trying to help the community...
@Phlip: This is a quite Java specific question about "the Java standard DOM API", so I'm not convinced this helps the community...
0

Further expanding on the answers already given:

public static DocumentFragment stringToFragment(Document document, String source) throws Exception
{
    source = "<dummy>" + source + "</dummy>";
    Node node = stringToDom(source).getDocumentElement();
    node = document.importNode(node, true);
    DocumentFragment fragment = document.createDocumentFragment();
    NodeList children = node.getChildNodes();
    while (children.getLength() > 0)
    {
        fragment.appendChild(children.item(0));
    }
    return fragment;
}

2 Comments

All you need is a stringToDom() now.
I think an answer stackoverflow.com/a/1509229/16673 shows how this can be implemented
-2

I would suggest not using the DOM API. It's slow and ugly.

Use streaming StAX instead. It's built into JDK 1.6+. You can fetch one element at a time, and it won't choke if you're missing a root element.

http://en.wikipedia.org/wiki/StAX

http://download.oracle.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html

3 Comments

Thanks. I don't have a choice but to use DOM, as I'm working on a big legacy system. Generally, it's neither slow nor ugly, IMO... Unless you can prove slowness to me with benchmarks?
I suppose slow is a relative term. DOM is fine for smaller documents. For large ones it consumes too much memory, and that's what slows things down.
@ccleve A minimal example using StAX (Java 1.7, Xerces as implementation) will show that it will choke to death if the xml is not well formed (missing a root element). Using <herpTag/><derpTag/> will result in an XMLStreamException stating "The markup in the document following the root element must be well-formed". My intention was to use StAX to assemble a DocumentFragment object. Do you have an example of using StAX in this manner? It would be nice to create DocumentFragments without having to implement a parser or wrap things in dummy tags.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.