1

i just want to parse two values from a html file .

enter image description here

there will be several list elements in the html file and i want to parse two values

a. 1 ,100, 101 b. Swargate to Shivajinagar Circle route , Mnapa bhavan to.. ,Kothrud depot to...

i have used the below code to parse it, but i am not getting the required values , here i am getting href value only.

please give me any solution for the above problem

   String html =

   "<li/><a href=r361.html>1</a> Swargate to Shivajinagar Circle route"+
  " <li/><a href=r511.html>100</a> Manpa bhavan to Hinjewadi phase 3"+
   "<li/><a href=r572.html>101</a> Kothrud depot to Kondhava Bu";

   Reader reader = new StringReader(html);
   HTMLEditorKit.Parser parser = new ParserDelegator();
   final List<String> links = new ArrayList<String>();

   parser.parse(reader, new HTMLEditorKit.ParserCallback(){
       public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
           if(t == HTML.Tag.A) {
               Object link = a.getAttribute(HTML.Attribute.HREF);
               if(link != null) {
                   links.add(String.valueOf(link));
               }
           }
       }
   }, true);

   reader.close();
   System.out.println(links);

}

UPDATE:

Now i am getting the value of a href using below code (using JSOUP Lib)

AssetManager assetManager = getAssets(); InputStream ims =assetManager.open("index.html"); Document doc = Jsoup.parse(ims, "UTF-8", "btc.com"); Elements busNum = doc.getElementsByTag("a"); pTagString = busNum.html();

Log.i("hh"," onPostExecute ="+PTagString);

Now i want to get the Value out side the a href for eg: Swargate to shivajinagar circle route.

anybody know the method or any idea

5
  • That does not look like valid HTML. Are you sure this is the input you need to parse? Commented Oct 9, 2012 at 13:30
  • i just took the code from html which i want to parse. right now i want to check whether i can parse the required values. dats it. when i parse above code i am getting the result like this [r361.html, r511.html, r572.html] Commented Oct 9, 2012 at 13:36
  • You want the a tag content, not the href attribute for starters. Also, @WimDeblauwe, not positive, but I think that's valid html assuming it's in a ul tag, albeit highly discouraged. Commented Oct 9, 2012 at 13:39
  • Try to override some of the other methods to see which callbacks are being called. For example: public void handleText(char[] data,int pos) Commented Oct 9, 2012 at 13:41
  • exactly correct . you have any idea about it.. pls share it to me . Commented Oct 9, 2012 at 13:41

1 Answer 1

1

You don't even need to use a parse for this. You could use a regular expression.

See this Tutorial about regex in Java

And then you'll need something like this:

<a[^>]*>([^<]*)<[^>]*>(.*)

as your regular expression. Then you will have both values you need in no time. It's much more performant than parsing the html.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.