0

I am trying to parse an XML file using Java.

The XML file size is 256 kb only. I am using a DOM parser to parse the XML file. How can I parse the large XML file content?

Here's the method that parses the file content:

public Document parse_a_string(StringBuffer decodedFile) {
    Document doc1 = null;
    try {
        DocumentBuilderFactory factory =
                DocumentBuilderFactory.newInstance();
        DocumentBuilder db = factory.newDocumentBuilder();
        InputSource inStream = new InputSource();

         // problem here
        inStream.setCharacterStream(new StringReader(decodedFile.toString()));

        doc1 = db.parse(inStream);
    } catch (Exception e) {
    }
    return doc1;
}

The file content is in the StringBuffer reference object, decodedFile, but when I set it to StringReader it accept only string.

6

5 Answers 5

5

For large documents (though I wouldn't call your's large) I'd use StAX.

Sign up to request clarification or add additional context in comments.

Comments

2

Take a look at the JDOM XML parsing library. It's miles ahead of the native Java parsers, and in my opinion, quite superior.

For the code you provided, you actually have to walk the DOM tree and retrieve elements. See here or the official Java tutorial on working with XML for more information on working with XML documents.

1 Comment

If the question is just on parsing the 256K file, JDom is good, as well as Dom4J or Xom.
2

You might want to look at a StAX implementation like Woodstox. It lets you pull elements from the parser, instead of the parser pushing data into the app, and lets you pause parsing.

Comments

2

256Kb is a pretty small file nowadays: yesterday I was working with a 45Gb file which is a factor of 200,000 larger!

It's not clear what your problem is. Any of the normal Java parsing techniques will work perfectly well. Which of them you use depends on why you are parsing the file and what you want to do with the data.

Having said that, many people seem to choose DOM by default because it is so well entrenched. However, more modern object models such as JDOM or XOM are much easier to work with.

2 Comments

could you please tell what did you use to parse that 45Gb File, actually i need to parse a large XML file of the order of 40 - 50 gb to a TSV or CSV, could you please tell how should i approach this.?
I was using the streaming facilities in Saxon-EE, documented at saxonica.com/documentation/sourcedocs/streaming.xml
0

Don't read the file into a String/StringReader and all that jazz. Parse the file directly via db.parse(new FileInputStream(...)). Reading the file into memory just wastes memory, and time.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.