Comparing two xml files using JAVA

Question

I have to xml files say abc.xml & 123.xml which are almost similar, i mean has the same content, but the second one i.e, 123.xml has more content than the earlier one. I want to read both the files using Java, and compare whether the content present in abc.xml for each tag is same as that in 123.xml, something like object comparison. Please suggest me how to read the xml file using java and start comparing.

Thanks.

In your case, I would probably suggest DOM parser (provided your files are not huge). Then you would effectively have your objects and could compare them field-by-field. — Aleks G
– Aleks G, Commented Apr 25, 2012 at 7:56
What will be if there are two nodes with same tag? How are going them to compare? — Eugen Martynov
– Eugen Martynov, Commented Apr 25, 2012 at 8:05
Actually all the contents of abc.xml are present in 123.xml. I just want to check that the elements with tags in abc.xml are there in 123.xml. — Sangram Anand
– Sangram Anand, Commented Apr 25, 2012 at 8:48

Zaz Gmy · Accepted Answer · 2012-04-25 08:05:35Z

13

if you just want to compare then use this:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();

Document doc1 = db.parse(new File("file1.xml"));
doc1.normalizeDocument();

Document doc2 = db.parse(new File("file2.xml"));

doc2.normalizeDocument();
Assert.assertTrue(doc1.isEqualNode(doc2));

else see this http://xmlunit.sourceforge.net/

answered Apr 25, 2012 at 8:05

Zaz Gmy

4,3663 gold badges22 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ziggy Over a year ago

Can this be used if one of the xml documents is qualified with namespaces but the other is not?

aviad · Accepted Answer · 2012-04-25 08:06:20Z

5

I would go for the XMLUnit. The features it provides :

the differences between two pieces of XML
The outcome of transforming a piece of XML using XSLT
The evaluation of an XPath expression on a piece of XML
The validity of a piece of XML
Individual nodes in a piece of XML that are exposed by DOM Traversal

Good Luck!

answered Apr 25, 2012 at 8:06

aviad

8,2789 gold badges55 silver badges100 bronze badges

Comments

Kai · Accepted Answer · 2012-04-25 08:04:43Z

4

I would use JAXB to generate Java objects from the XML files and then compare the Java files. They would make the handling much easier.

answered Apr 25, 2012 at 8:04

Kai

39.8k14 gold badges91 silver badges105 bronze badges

Comments

A_A · Accepted Answer · 2012-04-25 08:12:44Z

In general, if you know that you have two files with identical structure but slightly different and unordered content you are going to have to "read" the files to compare the contents.

If you have the XML Schema for your XML files then you could use JAXB to create a set of classes that will represent the specific DOM that is defined by your XML schema. The benefit of this approach is that you will not have to parse the XML file through generic functions for elements and attributes but rather through the actual fields that make sense to your problem.

Of course, to be able to detect the presence of the same entry across both files you are going to have to "match" them through some common field (for example, some ID).

To help you with the duplicates discovery process you could use some relevant data structure from Java's collections, like the Set (or one of its derivatives)

I hope this helps.

Michael Kay · Accepted Answer · 2012-04-25 08:38:12Z

The right approach depends on two factors:

(a) how much control do you want over how the comparison is done? For example, do you need to control whether whitespace is significant, whether comments should be ignored, whether namespace prefixes should be ignored, whether redundant namespace declarations should be ignored, whether the XML declaration should be ignored?

(b) what answer do you want? (i) a boolean: same/different, (ii) a list of differences suitable for a human to process, (iii) a list of differences suitable for an application to process.

The two techniques I use are: (a) convert both files to Canonical XML and then compare strings. This gives very little control and only gives a boolean result. (b) compare the two trees using the XPath 2.0 deep-equal() function or the extended Saxon version saxon:deep-equal(). The Saxon version gives more control over how the comparison is done, and a more detailed report of the differences found (for human reading, not for application use).

If you want to write Java code, you could of course implement your own comparison logic - for example you could find an open source implementation of XPath deep-equal, and modify it to meet your requirements. It's only a hundred or so lines of code.

Dheeraj Joshi · Accepted Answer · 2012-04-25 08:00:17Z

1

Well if you just want to compare and display then you can use Guiffy

It is a good tool. If u want to do the processing in backend then you must use DOM parser load both files to 2 DOM objects and compare attribute by attribute.

answered Apr 25, 2012 at 8:00

Dheeraj Joshi

3,1578 gold badges42 silver badges60 bronze badges

Comments

Nikolay Kasyanov · Accepted Answer · 2012-04-25 08:07:18Z

0

it's a bit overkill, but if your XML has schema, you can convert it into EMF metamodel & then use EMF Compare to compare.

answered Apr 25, 2012 at 8:07

Nikolay Kasyanov

9576 silver badges14 bronze badges

Collectives™ on Stack Overflow

Comparing two xml files using JAVA

7 Answers 7

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Related