Parsing words and tags from HTML in Java

Question

I need to extract all tags and words (in chronological order) from html file. Here's the example of file: one two thre What I want at the output is an array or a list which looks like this: {"", "one", "two", "thre", ""} I know that there are tools such as jTidy or Apache Tina, but these tools are for extracting only text (or only tags) from a document. What should I do?

Mike Thomsen · Accepted Answer · 2012-02-16 16:58:24Z

1

Use the JSoup library for this. It makes HTML parsing in Java incredibly easy.

answered Feb 16, 2012 at 16:58

Mike Thomsen

37.7k11 gold badges63 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Parsing words and tags from HTML in Java

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related