0

I have issue with parsing html data. Java's String.indexof() is extremelly slow. Could anyone suggest any solutions to significantly speed it up?

                      while (counter2 <= found)
                    {
                        number = Integer.toString(counter2);
                        start = page.indexOf("<result" + number + ">") + 8 + number.length();
                        end = page.indexOf("</result" + number + ">");
                        if (start > 0 && end > 0)
                        {
                            buffer = page.substring(start, end);
                        }
                        page = page.substring(end, page.length());
                        start = buffer.indexOf("<word>") + 6;
                        end = buffer.indexOf("</word>");
                        if (start > 0 && end > 0)
                        {
                            Word = buffer.substring(start, end);
                        }
                        start = buffer.indexOf("<vocabulary>") + 12;
                        end = buffer.indexOf("</vocabulary>");
                        if (start > 0 && end > 0)
                        {
                            Dictionary = buffer.substring(start, end);
                        }

                        start = buffer.indexOf("<id>") + 4;
                        end = buffer.indexOf("</id>");
                        if (start > 0 && end > 0)
                        {
                            ID = buffer.substring(start, end);
                        }

                        sqlDriver.createDictionaryWord("Wordlist", ID, Word, Dictionary);
                       // counter = counter + 1;
                        counter2 = counter2 + 1;

                    }

I need to make it work at least 5 times faster somehow. Thanks for any help.

1
  • Why wouldn't you use a XmlPullParser for this? Commented Aug 19, 2017 at 4:25

2 Answers 2

1

Pattern matcher using regex is quite faster than indexOf() for longer Strings (For smaller Strings, indexOf() is better than regex). Use your text and a regex to find the index of your String pattern.

Pattern pattern = Pattern.compile(regex);

public static void getIndices(String text, Pattern pattern) {

    Matcher matcher = pattern.matcher(text);

    matcher.find();
    System.out.print("Start index: " + matcher.start());
    System.out.print("End index: " + matcher.end());

}

Note that you have to compile your regex to Pattern object only once for every regex and so don't put it inside a loop.

Sign up to request clarification or add additional context in comments.

2 Comments

Count you please give me an example to get value between those two start = buffer.indexOf("<vocabulary>") + 12; end = buffer.indexOf("</vocabulary>"); I need value in between of that as string
Are you parsing an XML content? If so, as Michael as suggested in the comments, you should use XmlPullParser
0

I made xml and used advice to use XmlPullParser. A bit faster, but still on some devices over minute, diring file size 1.7mb. Quite confusing.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.