I have a project on benchmarking String Matching Algorithms and I would like to know if there is a standard for every algorithm so that I would be able to get fair results with my experimentation. I am planning to use java's system.nanotime in getting the running time of every algorithm. Any comment or reactions regarding my problem is very much appreciated. Thanks!
-
What do you mean by: "standard for every algorithm"? Isn't this what you are supposed to design if your project requires you to create a benchmark? Am I missing something?dirkgently– dirkgently2009-11-30 18:21:26 +00:00Commented Nov 30, 2009 at 18:21
-
Does that mean I am allowed to create my own version of the algorithms? or should I need to follow a pseudo code from a standard governing body like NSIT?Shamko– Shamko2009-11-30 18:31:50 +00:00Commented Nov 30, 2009 at 18:31
-
A benchmark is a usually set of inputs that you run your implementation against to measure values for a set of (mostly orthogonal) parameters. Often enough, you run the benchmark against a standard implementation to see how well yours is/are doing.dirkgently– dirkgently2009-11-30 18:47:24 +00:00Commented Nov 30, 2009 at 18:47
3 Answers
I am not entirely sure what you're asking. However, I am guessing you are asking how to get the most realistic results. You need to run your algorithm hundreds, or even thousands of iterations to get an average. It is also very important to turn off any caching that your language may do, and don't reuse objects, unless it is part of your algorithm.
Comments
I am not entirely sure what you're asking. However, another interpretation of what you are asking can be answered by trying to work out how a given algorithm performs as you increase the size of the problem. Using raw time to compare algorithms at a given string size does not necessarily allow for accurate comparison. Instead, you could try each algorithm with different string sizes and see how the algorithm behaves as string size varies.
And Mark's advice is good too. So you are running repeated trials for many different string lengths to get a picture of how one algorithm works, then repeating that for the next algorithm.
Comments
Again, it's not clear what you're asking, but here's another thought in addition to what Tony and Mark said:
Be very careful about testing only "real" input or only "random" input. Some algorithms are tuned to do well on typical input (searching for a word in English text), while others are tuned for working well on pathologically hard cases. You'll need a huge mix of possible inputs of all different types and sizes to do a truly good benchmark.