0

I am looking for a algorithm for string processing, I have searched for it but couldn't find a algorithm that meets my requirements. I will explain what the algorithm should do with an example.

There are two sets of word sets defined as shown below:

**Main_Words**: swimming, driving, playing
**Words_in_front**: I am, I enjoy, I love, I am going to go

The program will search through a huge set of words as soon it finds a word that is defined in Main_Words it will check the words in front of that Word to see if it has any matching words defined in Words_in_front.

i.e If the program encounters the word "Swimming" it has to check if the words in front of the word "Swimming" are one of these: I am, I enjoy, I love, I am going to go.

Are there any algorithms that can do this?

8
  • It depends ... what approach have you already tried? What language would you use to implement this? Commented Mar 30, 2013 at 9:36
  • I want to implement this using java. I know i can find the words defined in the main_words, i am not sure of the logic that i should use to check the words in front. Commented Mar 30, 2013 at 9:38
  • Is your "huge set of words" that you're searching through a long string of text, or literally a set? Also, how long might the longest words_in_front member be? Commented Mar 30, 2013 at 9:52
  • From how you described the problem, I'd say you could simply use if and .equals(). What's wrong with that? Commented Mar 30, 2013 at 11:25
  • @Noyo Its a text file filled with words Commented Mar 30, 2013 at 12:38

3 Answers 3

1

A straightforward way to do this would be to just do a linear scan through the text, always keeping track of the last N+1 words (or characters) you see, where N is the number of words (or characters) in the longest phrase contained in your words_in_front collection. When you have a "main word", you can just check whether the sequence of N words/characters before it ends with any of the prefixes you have.

This would be a bit faster if you transformed your words_in_front set into a nicer data structure, such as a hashmap (perhaps keyed by last letter in the phrase..) or a prefix/suffix tree of some sort, so you wouldn't have to do an .endsWith over every single member of the set of prefixes each time you have a matching "main word." As was stated in another answer, there is much room for optimization and a few other possible implementations, but there's a start.

Sign up to request clarification or add additional context in comments.

Comments

1

Create a map/dictionary/hash/associative array (whatever is defined in your language) with key in Main_Words and Words_in_front are the linked list attached to the entry pointed by the key. Whenever you encounter a word matching a key, go to the table and see if in the attached list there are words that match what you have in front.

That's the basic idea, it can be optimized for both speed and space.

Comments

1

You should be able to build a regular expression along these lines:

I (am|enjoy|love|am going to go) (swimming|driving|playing)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.