6

I want to make a function which checks a string for occurrences of other strings within them.
However, the sub-strings which are being checked may be interrupted within the main string by other letters.

For instance:

a = 'abcde'
b = 'ace'
c = 'acb'

The function in question should return as b being in a, but not c.

I've tried set(a). intersection(set(b)) already, and my problem with that is that it returns c as being in a.

2
  • These type of strings are called subsequences of the longer string. Commented Sep 11, 2010 at 12:59
  • This question is a special case of stackoverflow.com/questions/6877249/… The solutions there are much more efficient for solving this case as well. Commented Apr 6, 2014 at 8:42

3 Answers 3

11

You can turn your expected sequence into a regex:

import re

def sequence_in(s1, s2):
    """Does `s1` appear in sequence in `s2`?"""
    pat = ".*".join(s1)
    if re.search(pat, s2):
        return True
    return False

# or, more compactly:
def sequence_in(s1, s2):
    """Does `s1` appear in sequence in `s2`?"""
    return bool(re.search(".*".join(s1), s2))

a = 'abcde' 
b = 'ace' 
c = 'acb'

assert sequence_in(b, a)
assert not sequence_in(c, a)

"ace" gets turned into the regex "a.*c.*e", which finds those three characters in sequence, with possible intervening characters.

Sign up to request clarification or add additional context in comments.

Comments

5

how about something like this...

def issubstr(substr, mystr, start_index=0):
    try:
        for letter in substr:
            start_index = mystr.index(letter, start_index) + 1
        return True
    except: return False

or...

def issubstr(substr, mystr, start_index=0):
    for letter in substr:
        start_index = mystr.find(letter, start_index) + 1
        if start_index == 0: return False
    return True

2 Comments

I expect this to run faster than the regex-based answer. Do you have timings?
Nope no timings, just wrote it as an alternative.
3
def issubstr(s1, s2):
    return "".join(x for x in s2 if x in  s1) == s1

>>> issubstr('ace', 'abcde')
True

>>> issubstr('acb', 'abcde')
False

2 Comments

Explain the white space comment please.
The question was to find subsequence, not substring

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.