0

Hey I want to get tags from a html document.
That is everything that is contained within the angle brackets with the angle brackets inclusive. How can I do this in Java ? Thanks

1

2 Answers 2

3
<!-- Read carefully -->
<b><![CDATA[<Everything in angle brackets ("<>") is a tag?>]]></b>

... and use an html parser.


If you want to do it manually, iterate over the input chars and decide for each and every < and > whether it belongs to a tag element or not. There are some rules (processing instructions, comments, CDATA content, angle brackets in attribute values(!)) to follow.

Most parsers use some switch/case pattern for evaluating each token (char in your case).

Sign up to request clarification or add additional context in comments.

Comments

2

I used jsoup recently. Nice API, easy to use and no problems so far. Don 't even try to parse html yourself. See Andreas_D answer.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.