Hey I want to get tags from a html document.
That is everything that is contained within the angle brackets with the angle brackets inclusive.  How can I do this in Java ?
Thanks
- 
        java-source.net/open-source/html-parsersPeterMmm– PeterMmm2011-03-01 13:25:20 +00:00Commented Mar 1, 2011 at 13:25
 
                    
                        Add a comment
                    
                 | 
            
                
            
        
         
    2 Answers
<!-- Read carefully -->
<b><![CDATA[<Everything in angle brackets ("<>") is a tag?>]]></b>
... and use an html parser.
If you want to do it manually, iterate over the input chars and decide for each and every < and > whether it belongs to a tag element or not. There are some rules (processing instructions, comments, CDATA content, angle brackets in attribute values(!)) to follow.
Most parsers use some switch/case pattern for evaluating each token (char in your case).
Comments
I used jsoup recently. Nice API, easy to use and no problems so far. Don 't even try to parse html yourself. See Andreas_D answer.