String search algorithms

Question

I have a project on benchmarking String Matching Algorithms and I would like to know if there is a standard for every algorithm so that I would be able to get fair results with my experimentation. I am planning to use java's system.nanotime in getting the running time of every algorithm. Any comment or reactions regarding my problem is very much appreciated. Thanks!

What do you mean by: "standard for every algorithm"? Isn't this what you are supposed to design if your project requires you to create a benchmark? Am I missing something? — dirkgently
– dirkgently, Commented Nov 30, 2009 at 18:21
Does that mean I am allowed to create my own version of the algorithms? or should I need to follow a pseudo code from a standard governing body like NSIT? — Shamko
– Shamko, Commented Nov 30, 2009 at 18:31
A benchmark is a usually set of inputs that you run your implementation against to measure values for a set of (mostly orthogonal) parameters. Often enough, you run the benchmark against a standard implementation to see how well yours is/are doing. — dirkgently
– dirkgently, Commented Nov 30, 2009 at 18:47

Mark Bertenshaw · Accepted Answer · 2009-11-30 18:32:12Z

1

I am not entirely sure what you're asking. However, I am guessing you are asking how to get the most realistic results. You need to run your algorithm hundreds, or even thousands of iterations to get an average. It is also very important to turn off any caching that your language may do, and don't reuse objects, unless it is part of your algorithm.

answered Nov 30, 2009 at 18:32

Mark Bertenshaw

5,7093 gold badges30 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tony van der Peet · Accepted Answer · 2009-12-14 07:46:58Z

I am not entirely sure what you're asking. However, another interpretation of what you are asking can be answered by trying to work out how a given algorithm performs as you increase the size of the problem. Using raw time to compare algorithms at a given string size does not necessarily allow for accurate comparison. Instead, you could try each algorithm with different string sizes and see how the algorithm behaves as string size varies.

And Mark's advice is good too. So you are running repeated trials for many different string lengths to get a picture of how one algorithm works, then repeating that for the next algorithm.

dmazzoni · Accepted Answer · 2009-12-14 07:52:12Z

1

Again, it's not clear what you're asking, but here's another thought in addition to what Tony and Mark said:

Be very careful about testing only "real" input or only "random" input. Some algorithms are tuned to do well on typical input (searching for a word in English text), while others are tuned for working well on pathologically hard cases. You'll need a huge mix of possible inputs of all different types and sizes to do a truly good benchmark.

answered Dec 14, 2009 at 7:52

dmazzoni

13.3k4 gold badges42 silver badges34 bronze badges

1 Comment

Tony van der Peet Over a year ago

How the heck did you manage to answer a 2 week old question 5 minutes after I did?!

Collectives™ on Stack Overflow

String search algorithms

3 Answers 3

Comments

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Related