I am trying out the following former Google interview question.
The Challenge Given a string S and a set of words D, find the longest word in D that is a subsequence of S.
Word W is a subsequence of S if some number of characters, possibly zero, can be deleted from S to form W, without reordering the remaining characters.
Note: D can appear in any format (list, hash table, prefix tree, etc.
For example, given the input of S = "abppplee" and D = {"able", "ale", "apple", "bale", "kangaroo"} the correct output would be "apple"
The words "able" and "ale" are both subsequences of S, but they are shorter than "apple". The word "bale" is not a subsequence of S because even though S has all the right letters, they are not in the right order. The word "kangaroo" is the longest word in D, but it isn't a subsequence of S. Learning objectives This question gives you the chance to practice with algorithms and data structures. It’s also a good example of why careful analysis for Big-O performance is often worthwhile, as is careful exploration of common and worst-case input conditions
My approach uses a greedy algorithm.
- Sort D in descending order(longest word first) 
- Find the index of first character in a word 
- Scan S from - indexOfFirstCharacterto find other characters in the word
- If we reach to the end of string and there are still characters remaining to be seen in a word then the word is not found 
- Repeat 2-4 until we find a word 
from collections import deque
D = ["able","ale", "apple", "bale", "kangaroo"]
"""
TC : DlogD 
"""
if len(D) > 1:
  D.sort(key=len,reverse=True)
S = "abppplee"
s_dict_first = {}
"""
TC : O(S) 
"""
for i,s in enumerate(S):
  if s not in s_dict_first: s_dict_first[s] = i
"""
TC : O(1)
"""
#return first index of char
def getindex(char):
  if char in s_dict_first:
    return s_dict_first[char]
  return -1
"""
TC : O(w + w +S) = O(w + s)
"""
def isSubstringPresent(startIndex,word):
  #word is too big to be found in S
  if len(word) - 2 > len(S) - 1 - startIndex:
    return False
  remaining_chars = list(word[1:])
  char_queue = deque()
  for char in remaining_chars:
    char_queue.append(char)
  for i in range(startIndex,len(S)):
    if len(char_queue) == 0:return True 
    if S[i] == char_queue[0]:
      char_queue.popleft()
  return len(char_queue) == 0
"""
TC : O(DlogD + D(w + s)) 
"""
for word in D:
  first_char = word[0]
  indexStart = getindex(first_char) + 1
  if indexStart == 0:continue
  if isSubstringPresent(indexStart,word):
     print(word)
     break;
I am open to suggestion in improving the code / any bug in my time/space complexity analysis or any other useful comments.
D = ['abcd', 'bcd']andS = 'abcd'then this code printsbcd. But the problem says that it should print the longest word that is a substring of the target, so it should printabcd. \$\endgroup\$D.sort(key=len,reverse=True)\$\endgroup\$