3

I want to parse a HTML file using Java and i have used DocumentBuilder class for it. My HTML contains a <img src="xyz"> tag, without a closing </img> tag,which is allowed in browser.But when i give it to DocumentBuilder for parsing it gives me this error

The element type "img" must be terminated by the matching end-tag </img>.

Java :

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(is);

What should i do to get rid of this error?

4
  • The element type "img" must be terminated by the matching end-tag "</img>". You probably need valid html to parse it. All tags must have ending part, or at least be defined as <img src="xyz" /> Commented Aug 11, 2015 at 9:36
  • 1
    HTML isn't XML and isn't subject to the same validation Commented Aug 11, 2015 at 9:38
  • @Jakuje but <img> without a closing tag is a valid html.For ex : w3schools.com/tags/tryit.asp?filename=tryhtml_image_test Commented Aug 11, 2015 at 9:38
  • libxml2 doesn't have this problem. It shuts up about the Official Rules and just parses that HTML, subject to varying levels of validation... Commented Jul 11, 2020 at 13:17

2 Answers 2

5

DocumentBuilder is part of Java's XML parsing framework. An XML parser will not correctly parse HTML: the languages look similar, but XML has stricter requirements. (You've already seen one of the differences: in XML, all tags should have a matching end tag, while in HTML some tags do and some don't.)

Try a HTML parser instead. I've heard good things about jsoup (http://jsoup.org/).

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks i will try that.But are there any disadvantages of using jsoup?
I need to run XPath queries on the HTML. Just like (ahem) Gnome's libxml2 can do...
0

You can also use TagSoup to parse HTML as if it were XML, though that will give you SAX rather than DOM.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.