2

I'm trying to create a program in Java that takes two XML files (one is an updated version of the other) and takes them into main memory. It will then compare the files and count the number of differences between each corresponding node from the two (excluding white space). Later on the program will do more with the differences but I'm just confused on how to start comparing nodes from two separate files. Any suggestions would be much appreciated.

3 Answers 3

1

My first suggestion is that you could use XMLUnit:

Reader expected=new FileReader(...);
Reader tested=new FileReader(...);
Diff diff=XMLUnit.compareXML(expected, tested);
Sign up to request clarification or add additional context in comments.

Comments

1

For an algorithm that computes signatures (hashes) at each node to facilitate comparison, see Detecting Changes in XML Documents.

For change detection on XML documents where element ordering is insignificant, see X-Diff: An Effective Change Detection Algorithm for XML Documents. Java and C++ implementations of the X-Diff algorithm are available.

Comments

0

It depends if you have differences of nodes, or differences inside nodes.

This code extract all nodes, and their paths, and value inside

Assuming, you have two xml Documents:

XPath xPath = XPathFactory.newInstance().newXPath();
//Every nodes
expression="//*";
NodeList nodes  = (NodeList)  xPath.compile(expression).evaluate(document, XPathConstants.NODESET);

// iterate them all
for(int i=0; i<nodes.getLength(); i++)
{
 Node the_node = nodes.item(i);

 if(the_node instanceof Element)
    {
     Element the_element=(Element) the_node;

    // PATH 
    String path ="";
    Node noderec = the_node; 
    while( noderec  != null) 
        {
        if (path.equals("")) path = noderec.getNodeName();
        else
       path = noderec.getNodeName() + '/' + path;
       noderec = noderec.getParentNode();

       if (noderec==document){path="//"+path; noderec=null;}
       }
      System.out.println( "PATH:"+path );
     System.out.println("CONTENT="+the_element.getTextContent());
    }
}

PATH : gives you the path

CONTENT: sub content of the node

With that, you get all the pathes of your xml: you can compare one by one, sort, and use others algorithms to find if something is inserted, ...

And inside each node, you can make another comparisons.

Hope it helps

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.