3

I want to implement an algorithm in Java to find the nearest similar Strings.

I have station_names in mysql database like - 23 ST, 233 ST, 21 ST, 14 St Times Sq, 24 ST

and if user enters a search string like 23rd station then I should return 23 ST and 233 ST or if user enters like Times Square then result should be 14 St Times Sq.

I found many algorithms on internet but I m confused to which one to use.

Can you please suggest me the best algorithm that can I implement in Java?

Thanks in advance

5
  • 1
    "Can you please suggest me the best algorithm" I'd typically go for the one with polka-dots, since it is prettier. Of course, your definition of 'better' might not include the visual effect, so why don't you tell us what you mean by better? Commented Dec 26, 2012 at 12:40
  • Thanks Andrew for ur reply, best algorithm means that will result most similar strings that user want to search, e.g. for 23 ST user can give search strings like 23rd Station / 23 Station / 23rd St ect Commented Dec 26, 2012 at 12:45
  • en.wikipedia.org/wiki/String_searching_algorithm talks about some of the popular algorithm but you need to implement them in Java Commented Dec 26, 2012 at 13:02
  • Might help. not sure. stackoverflow.com/q/2891514/1135954 Commented Dec 26, 2012 at 13:02
  • Thank you @AurA and mtk for your suggestions Commented Dec 26, 2012 at 13:17

2 Answers 2

2

To answer your question, there is no best algorithm in general, only the one that works best in your specific case.

You will want to define one or more metrics to measure differences between the input and the strings you have in the DB and then sort the results by score (see String metric).

The problem is that the most similar string is not always the closest address. That's why I said you have to define your own metric.

Sign up to request clarification or add additional context in comments.

Comments

1

There are many possible ways to do this. For example you might say that 21 ST is closer to 23rd station than 233 ST. You have to work out what you want and find the approach which will match it best.

It is likely you might need multiple approaches and then score the results. This is what I would do.

You can test the different approach by providing a large sample data test suite and finding out which of the approaches (or a combination) give you the highest success rate.

2 Comments

Thanks Peter for your answer, I want to return the most similar strings that user want to search, for e.g. for 23 ST**(actual station name) user can enter search strings - **23rd Station / 23 Station / 23rd St
Can you define "most similar"? While it is something most people have an idea of, for a computer you need to define it formally.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.