0

Need java function to find the longest duplicate substring in a string

For instance, if the input is “banana”,output should be "ana" and we have count the number of times it has appeared in this case it is 2.

The solution is as below

public class Test{
public static void main(String[] args){
System.out.println(findLongestSubstring("i like ike"));
System.out.println(findLongestSubstring("madam i'm adam"));
System.out.println(findLongestSubstring("When life hands you lemonade, make lemons"));
System.out.println(findLongestSubstring("banana"));
}

public static String findLongestSubstring(String value) {
    String[] strings = new String[value.length()];
    String longestSub = "";

    //strip off a character, add new string to array
    for(int i = 0; i < value.length(); i++){
        strings[i] = new String(value.substring(i));
    }

    //debug/visualization
    //before sort
    for(int i = 0; i < strings.length; i++){
        System.out.println(strings[i]);
    }

    Arrays.sort(strings);
    System.out.println();

    //debug/visualization
    //after sort
    for(int i = 0; i < strings.length; i++){
        System.out.println(strings[i]);
    }

    Vector<String> possibles = new Vector<String>();
    String temp = "";
    int curLength = 0, longestSoFar = 0;

    /*
     * now that the array is sorted compare the letters
     * of the current index to those above, continue until 
     * you no longer have a match, check length and add
     * it to the vector of possibilities
     */ 
    for(int i = 1; i < strings.length; i++){
        for(int j = 0; j < strings[i-1].length(); j++){
            if (strings[i-1].charAt(j) != strings[i].charAt(j)){
                break;
            }
            else{
                temp += strings[i-1].charAt(j);
                curLength++;
            }
        }
        //this could alleviate the need for a vector
        //since only the first and subsequent longest 
        //would be added; vector kept for simplicity
        if (curLength >= longestSoFar){
            longestSoFar = curLength;
            possibles.add(temp);
        }
        temp = "";
        curLength = 0;
    }

    System.out.println("Longest string length from possibles: " + longestSoFar);

    //iterate through the vector to find the longest one
    int max = 0;
    for(int i = 0; i < possibles.size();i++){
        //debug/visualization
        System.out.println(possibles.elementAt(i));
        if (possibles.elementAt(i).length() > max){ 
            max = possibles.elementAt(i).length();
            longestSub = possibles.elementAt(i);
        }
    }
    System.out.println();
    //concerned with whitespace up until this point
    // "lemon" not " lemon" for example
    return longestSub.trim(); 
}

}

11
  • 1
    Interesting question, but have you tried something? Commented Dec 15, 2010 at 18:31
  • 4
    stackoverflow.com/questions/2172033/… Commented Dec 15, 2010 at 18:33
  • @khachik,i dont know how to proceed Commented Dec 15, 2010 at 18:33
  • @Aix,do you have the java function for the same,it says use a suffix tree Commented Dec 15, 2010 at 18:34
  • 1
    @Deepak you can try to implement it as you imagine, or there is a wikipedia page about it Commented Dec 15, 2010 at 18:38

2 Answers 2

5

This is a common CS problem with a dynamic programming solution.

Edit (for lijie):

You are technically correct -- this is not the exact same problem. However this does not make the link above irrelevant and the same approach (w.r.t. dynamic programming in particular) can be used if both strings provided are the same -- only one modification needs to be made: don't consider the case along the diagonal. Or, as others have pointed out (e.g. LaGrandMere), use a suffix tree (also found in the above link).

Edit (for Deepak):

A Java implementation of the Longest Common Substring (using dynamic programming) can be found here. Note that you will need to modify it to ignore "the diagonal" (look at the Wikipedia diagram) or the longest common string will be itself!

Sign up to request clarification or add additional context in comments.

8 Comments

the problem isn't longest common substring. at least the mapping is not trivial. note that this problem has only 1 input string, whereas the LCS problem is getting the longest common substring between 2 input strings.
@lijie Thanks for keeping me on my toes. I have updated the answer.
@pst,i need the longest duplicate substring in a string,Will your implementation return "ana" as the answer
@lijie,will the answer return "ana" as answer
@Deepak: yes it will. however, it (the DP solution) is not the most efficient algorithm possible (that would be suffix trees: O(n))
|
1

In Java : Suffix Tree.

Thanks to the ones that have found how to solve it, I didn't know.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.