I am trying out the following former google interview question.
The Challenge Given a string S and a set of words D, find the longest
word in D that is a subsequence of S.
Word W is a subsequence of S if some number of characters, possibly
zero, can be deleted from S to form W, without reordering the
remaining characters.
Note: D can appear in any format (list, hash table, prefix tree, etc.
For example, given the input of S = "abppplee" and D = {"able", "ale",
"apple", "bale", "kangaroo"} the correct output would be "apple"
The words "able" and "ale" are both subsequences of S, but they are
shorter than "apple". The word "bale" is not a subsequence of S
because even though S has all the right letters, they are not in the
right order. The word "kangaroo" is the longest word in D, but it
isn't a subsequence of S. Learning objectives This question gives you
the chance to practice with algorithms and data structures. It’s also
a good example of why careful analysis for Big-O performance is often
worthwhile, as is careful exploration of common and worst-case input
conditions
My approach uses greedy algorithm.
1.Sort D in descending order(longest word first)
2.Find the index of first character in a word
3.Scan S from indexOfFirstCharacter to find other characters in the word
4.If we reach to the end of string and there are still characters remaining to be seen in a word then the word is not found
5.Repeat 2-4 until we find a word
from collections import deque
D = ["able","ale", "apple", "bale", "kangaroo"]
"""
TC : DlogD
"""
if len(D) > 1:
D.sort(reverse=True)
S = "abppplee"
s_dict_first = {}
"""
TC : O(S)
"""
for i,s in enumerate(S):
if s not in s_dict_first: s_dict_first[s] = i
"""
TC : O(1)
"""
#return first index of char
def getindex(char):
if char in s_dict_first:
return s_dict_first[char]
return -1
"""
TC : O(w + w +S) = O(w + s)
"""
def isSubstringPresent(startIndex,word):
#word is too big to be found in S
if len(word) - 2 > len(S) - 1 - startIndex:
return False
remaining_chars = list(word[1:])
char_queue = deque()
for char in remaining_chars:
char_queue.append(char)
for i in range(startIndex,len(S)):
if len(char_queue) == 0:return True
if S[i] == char_queue[0]:
char_queue.popleft()
return len(char_queue) == 0
"""
TC : O(DlogD + D(w + s))
"""
for word in D:
first_char = word[0]
indexStart = getindex(first_char) + 1
if indexStart == 0:continue
if isSubstringPresent(indexStart,word):
print(word)
break;
I am open to suggestion in improving the code / any bug in my time/space complexity analysis or any other useful comments.