Java: get xpath of element in org.w3c.dom document

Question

I've written what I want to achieve. however, getElementIdx() function doesn't return proper count. There's an issue with getPreviousSibling() but I don't know why.

public static String getElementXpath(DOMElement elt){
        String path = ""; 

        try{
            for (; elt != null; elt = (DOMElement) elt.getParentNode()){
                int idx = getElementIdx(elt);
                String xname = elt.getTagName().toString();

                if (idx >= 1) xname += "[" + idx + "]";
                path = "/" + xname + path;  
            }
        }catch(Exception ee){
        }
        return path;                            
    }

    public static int getElementIdx(DOMElement elt) {
      int count = 1;
      try{

         for (DOMElement sib = (DOMElement) elt.getNextSibling(); sib != null; sib = (DOMElement) sib.getNextSibling())
            {
                if(sib.getTagName().equals(elt.getTagName())){
                    count++;
                }
            }
      }catch(Exception ee){      
      }
        return count;
    }

Please describe more closely the XPath format you want to get, or perhaps just state the purpose of the XPath expression you want the function to return. I noticed the JavaScript function handles @id specially. Do you or don't you want to pay special attention to @id? — Lumi
– Lumi, Commented Mar 21, 2011 at 18:48
Also, in your first sentence, you're writing getElementByXpath(), when I think you want getXpathForElement() - could you clarifiy? — Lumi
– Lumi, Commented Mar 21, 2011 at 18:49
Michael, yes that is coreect. I want attention to @id. So I will get like xpath format as following //duv[@id="meni"]/span/a[2] . — KJW
– KJW, Commented Mar 21, 2011 at 21:33
com.collaxa.xml.XPathUtils.getXPathExprFromNode(Node) isn't this what you're looking for? — Anton
– Anton, Commented Jul 8, 2013 at 11:02
Sorry. wrong package in the first comment. com.ibm.wsdl.util.xml.getXPathExprFromNode(Node) isn't this what you're looking for? — Anton
– Anton, Commented Jul 8, 2013 at 11:08

Jon Skeet · Accepted Answer · 2011-03-23 08:57:50Z

6

+50

Your title talks about getPreviousSibling(), but your code only uses getNextSibling() - why? I can't see why you'd want to use getNextSibling() at all... you want to find out how many elements of the same name come before the current one, not how many come after it.

The fact that you're catching and swallowing exceptions is also deeply suspicious... why would you want to do that? If you have an exception, shouldn't the method terminate with an exception?

You should also probably take account of the fact that getPreviousSibling may not return an element - it may return a text node, for example. You'll want to skip over those - currently you'd get an exception, which would terminate the loop and return the current count.

If these don't help, please post some sample XML, point out a node, and say what the code is currently returning (as well as posting your updated code). Just saying it doesn't return the proper count isn't nearly as useful as saying what it does return, and what you expected it to return.

EDIT: This is what I'd expect the code to look like:

public static int getElementIndex(Element original) {
  int count = 1;

  for (Node node = original.getPreviousSibling(); node != null;
       node = node.getPreviousSibling()) {
    if (node instanceof Element) {
      Element element = (Element) node;
      if (element.getTagName().equals(original.getTagName()) {
        count++;
      }
    }
  }

  return count;
}

You could also use if (node.getNodeType() == Node.ELEMENT_NODE) instead of the instanceof test.

edited Mar 23, 2011 at 8:57

answered Mar 23, 2011 at 7:35

Jon Skeet

1.5m893 gold badges9.3k silver badges9.3k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

KJW Over a year ago

yes. that was a typo. I mean getPreviousSibling() only. nextsibling() doesn't make sense. basically, only 1 is being returned by that function. getElementIdx(element) is supposed to count up all the previous siblings that is an ELEMENT_NODE and previousSibling.tagname matches element.tagname. so I would end up with /html/body/p[3] something like that.

Jon Skeet Over a year ago

@Kim: And the exception handling? If the first "previous sibling" isn't an element, that would explain why you're getting a count of 1... and exception will be thrown immediately due to the bad cast, and you'll just return the current value. Swallowing exceptions is almost always a bad idea. Also, is there any reason you're using DOMElement instead of just Element? It's better to stick to the w3c types where possible, IMO. If you could change your code into a short but complete program we could all run, that would help too...

KJW Over a year ago

Yes, the exception handling seems not right here. I just wanted the for loop to continue even when exception is raised....Yes, I understand that exception is immediately raised and that is why I always get 1 and stack trace shows that it is basically complaining about bad casting. DOMElement inherits org.w3c.dom.Element. It just has extra features. I'm not sure what happens to my question when the bounty expires, will it get deleted? I will try asking this question again with a separate bounty..

KJW Over a year ago

The thing is I don't understand why or where the exception is being swallowed. I mean, when I don't catch things, then the code just runs like nothing went wrong. How to unswallow exception?

Jon Skeet Over a year ago

@Kim Jong Woo: In terms of exception swallowing, I notice that you're swallowing all exceptions at two levels of code. Just stop doing that :)

|

crowne · Accepted Answer · 2011-03-11 06:11:21Z

4

Dom4j xpath support is really good, you can access any element by providing the xpath expression.
However I'm not sure whether the reverse is true, i.e. whether given an element you can derive the xpath expression.

See the api at http://www.docjar.com/projects/dom4j-1.6.1-code.html

Note avoid www.dom4j.org, it appears to have been hi-jacked by some kind of spammy link farm.

answered Mar 11, 2011 at 6:11

crowne

8,5733 gold badges43 silver badges52 bronze badges

7 Comments

Chris Lercher Over a year ago

Yes, that's possible in Dom4j: Use Node.getUniquePath(). However, you'd need to convert the W3C Document to a Dom4j document first. Actually, that's very easy (just use new DOMReader().read(w3cDocument)), but it's not a very efficient solution, especially if the conversion has to be done repeatedly.

KJW Over a year ago

what is a better approach? Right now, I am just trying to translate this Javascript function into Java. snippets.dzone.com/posts/show/3754

KJW Over a year ago

@Chris, what if the Dom4j is used over and over repeatedly? would it be slow or waste memory?

Chris Lercher Over a year ago

@Kim: No, if you use Dom4j instead of W3C DOM without translating between them, then there's no performance penalty.

KJW Over a year ago

@Chris, I am doing translating. I am already half way through writing a function to construct cardinal xpath from org.w3c.dom.domdocument and also to read xpath. Should I continue this pursuit or switch over to dom4j. If the penalty isn't bad for translating w3c dom document repeatedly, I might as well....

|

Lumi · Accepted Answer · 2011-03-23 00:13:48Z

I played around with the XOM library, which has a good API. Doing this on foot is more difficult than in XSLT. The following will get you started. Note that the sibling position stuff is missing.

An interface:

package milu.calcxpath;
import nu.xom.Node;
import nu.xom.ParentNode;

public interface Calculator
{
    public void buildXPath( Node node, StringBuilder sb );
    public void buildXPath( ParentNode node, StringBuilder sb );
}

Implementing class:

package milu.calcxpath;
import nu.xom.Attribute;
import nu.xom.Comment;
import nu.xom.Document;
import nu.xom.Element;
import nu.xom.Node;
import nu.xom.ParentNode;
import nu.xom.ProcessingInstruction;
import nu.xom.Text;

public class SimpleCalculator implements Calculator
{
    @Override
    public void buildXPath( Node node, StringBuilder sb )
    {
        if ( null == node )
            return;
        if ( this.findShortCut(node, sb) )
            return;

        ParentNode parent = node.getParent();
        boolean doParents = true;
        if ( parent instanceof Element )
            if ( this.findShortCut((Element) parent, sb) )
                doParents = false;
        if ( doParents )
            this.buildXPath(parent, sb);

        if ( node instanceof Element ) {
            String name = ( (Element) node ).getLocalName();
            sb.append("/" + name);
        } else if ( node instanceof Attribute ) {
            sb.append("/@" + ( (Attribute) node ).getLocalName());
        } else if ( node instanceof Text ) {
            sb.append("/text()");
        } else if ( node instanceof Comment ) {
            sb.append("/comment()");
        } else if ( node instanceof ProcessingInstruction ) {
            sb.append("/processing-instruction()");
        }
    }

    protected boolean findShortCut( Node node, StringBuilder sb )
    {
        return false;
    }

    @Override
    public void buildXPath( ParentNode node, StringBuilder sb )
    {
        if ( null == node )
            return;
        ParentNode parent = node.getParent();
        if ( null == parent )
            return;
        else if ( parent instanceof Document ) {
            ;
        } else { // element
            if ( ! this.findShortCut((Element) parent, sb) )
                this.buildXPath(parent, sb);
        }
        sb.append("/");
        sb.append(( (Element) node ).getLocalName());
    }

    protected boolean findShortCut( Element elm, StringBuilder sb )
    {
        return false;
    }
}

Another one, extending it. This does the @id stuff.

package milu.calcxpath;
import nu.xom.Attribute;
import nu.xom.Element;
import nu.xom.Node;

public class IdShortCutCalculator extends SimpleCalculator
{
    final private static String ID = "id";

    @Override
    protected boolean findShortCut( Node node, StringBuilder sb )
    {
        if ( ! ( node instanceof Attribute ) )
            return false;
        Attribute attr = (Attribute) node;
        if ( ! attr.getLocalName().equals(ID) )
            return false;
        sb.append("//@id='");
        sb.append(attr.getValue());
        sb.append("'");
        return true;
    }

    @Override
    protected boolean findShortCut( Element elm, StringBuilder sb )
    {
        String val = elm.getAttributeValue(ID);
        if ( null == val )
            return false;
        sb.append("//*[@id='");
        sb.append(val);
        sb.append("']");
        return true;
    }
}

Another class as a frontend:

package milu.calcxpath;

import nu.xom.Node;

public class XPathCalculator
{
    private Calculator calculator;

    public XPathCalculator(Calculator calc) {
        this.calculator = calc;
    }

    public String calculateXPath( Node node )
    {
        StringBuilder sb = new StringBuilder();
        this.calculator.buildXPath(node, sb);
        return sb.toString();
    }
}

And a test script:

package milu.calcxpath;
import nu.xom.Builder;
import nu.xom.Document;
import nu.xom.Nodes;

public class Test
{
    public static void main( String[] args ) throws Exception
    {
        Builder builder = new Builder();
        Document doc = builder.build(Test.class.getResourceAsStream("/milu/calcxpath/eins.xml"));
        Calculator calc;
        // calc = new SimpleCalculator();
        calc = new IdShortCutCalculator();
        XPathCalculator xpc = new XPathCalculator(calc);
        show(xpc, doc, "//*");
        show(xpc, doc, "//@*");
        show(xpc, doc, "//node()");
        show(xpc, doc, "//processing-instruction()");
        show(xpc, doc, "//*//processing-instruction()");
    }

    private static void show( XPathCalculator xpc, Document doc, String xpath )
    {
        System.out.println("==========================");
        System.out.println("    " + xpath);
        Nodes nodes = doc.query(xpath);
        int size = nodes.size();
        for ( int i = 0; i < size; i++ )
            System.out.println(xpc.calculateXPath(nodes.get(i)));
    }
}

The doc I used for testing:

<Urmel>
  <!-- spukt im Schloss -->
  <Monster xmlns="urn:X-Monster">
    <Gurke>
      <?Garten eins="zwei" drei="vier"?>
      <Heini Hecht="toll">
        <eins>eins</eins>
        <zwei id="ich-bin-die-zwei">zwei</zwei>
        <drei letzt="1">drei</drei>
      </Heini>
      <!-- Es kann nur einen geben :-) -->
    </Gurke>
    <Tomate id="pomodoro">
      <eene/>
      <meene/>
      <miste>Auweia!</miste>
      <aa>
        <bb>
          <cc>dd</cc>
        </bb>
      </aa>
    </Tomate>
  </Monster>
</Urmel>

Far from perfect, but I hope this helps! :-)

I've created a much more shorter version but it's not working 100%. I've updated my question.

Collectives™ on Stack Overflow

Java: get xpath of element in org.w3c.dom document

3 Answers 3

7 Comments

7 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

7 Comments

1 Comment

Related