Python - Remove any element from a list of strings that is a substring of another element

Question

So starting with a list of strings, as below

string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']

I want to remove any element from the list that is a substring of another element, giving the result for instance...

string_list = ['resting', 'looked', 'spit']

I have some code that acheives this but it's embarrassingly ugly and probably needlessly complex. Is there a simple way to do this in Python?

let's see the embarrassingly ugly version. it'll be a good... ice breaker — mhlester
– mhlester, Commented Feb 12, 2014 at 6:22
and i have never seen someone ridiculed for their code in a question — mhlester
– mhlester, Commented Feb 12, 2014 at 6:23
I asked a similar question the other day stackoverflow.com/questions/21653585/… — Darren Haynes
– Darren Haynes, Commented Feb 12, 2014 at 6:57
A similar question with possibly more performant solutions can be found here — Aran-Fey
– Aran-Fey, Commented Mar 28, 2018 at 17:54

Liyan Chang · Accepted Answer · 2016-10-04 03:06:55Z

First building block: substring.

You can use in to check:

>>> 'rest' in 'resting'
True
>>> 'sing' in 'resting'
False

Next, we're going to choose the naive method of creating a new list. We'll add items one by one into the new list, checking if they are a substring or not.

def substringSieve(string_list):
    out = []
    for s in string_list:
        if not any([s in r for r in string_list if s != r]):
            out.append(s)
    return out

You can speed it up by sorting to reduce the number of comparisons (after all, a longer string can never be a substring of a shorter/equal length string):

def substringSieve(string_list):
    string_list.sort(key=lambda s: len(s), reverse=True)
    out = []
    for s in string_list:
        if not any([s in o for o in out]):
            out.append(s)
    return out

Aaron Hall · Accepted Answer · 2016-06-01 12:55:53Z

3

Here's a possible solution:

string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
def string_set(string_list):
    return set(i for i in string_list 
               if not any(i in s for s in string_list if i != s))

print(string_set(string_list))

prints out:

set(['looked', 'resting', 'spit'])

Note I create a set (using a generator expression) to remove possibly duplicated words as it appears that order does not matter.

edited Jun 1, 2016 at 12:55

answered Feb 12, 2014 at 6:34

Aaron Hall♦

400k93 gold badges415 silver badges342 bronze badges

Comments

placeybordeaux · Accepted Answer · 2014-02-12 06:38:48Z

2

Another one liner:

[string for string in string_list if len(filter(lambda x: string in x,string_list)) == 1]

should be fairly readable, just not that pythonic.

answered Feb 12, 2014 at 6:38

placeybordeaux

2,2461 gold badge21 silver badges44 bronze badges

2 Comments

Hieu Over a year ago

Note for python 3, filter returns an iterator, so this may raise TypeError: object of type 'filter' has no len(). Just need to wrap filter with list: len(list(filter(lambda x: string in x,string_list))).

Hieu Over a year ago

Also, if string_list hash duplicates e.g. ['apple', 'apple']. This will return an empty list, instead of ['apple']. This behavior may or may not be wanted.

Michael0x2a · Accepted Answer · 2014-02-12 06:34:47Z

Here's one method:

def find_unique(original):
    output = []

    for a in original:
        for b in original:
            if a == b:
                continue     # So we don't compare a string against itself
            elif a in b:
                break
        else:
            output.append(a) # Executed only if "break" is never hit

    return output

if __name__ == '__main__':
    original = ['rest', 'resting', 'look', 'looked', 'it', 'split']
    print find_unique(original)

It exploits the fact that we can easily check if one string is a substring of another by using the in operator. It essentially goes through each string, checks to see if it's a substring of another, and appends itself to an output list if it isn't.

This prints out ['resting', 'looked', 'split']

Sunny Nanda · Accepted Answer · 2014-02-12 06:35:14Z

0

Here is a one-liner that does what you want:

filter(lambda x: [x for i in string_list if x in i and x != i] == [], string_list)

Example:

>>> string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
>>> filter(lambda x: [x for i in string_list if x in i and x != i] == [], string_list)
['resting', 'looked', 'spit']

answered Feb 12, 2014 at 6:35

Sunny Nanda

2,3821 gold badge16 silver badges10 bronze badges

Comments

Shan Valleru · Accepted Answer · 2014-02-12 07:07:53Z

Here's is the efficient way of doing it (relative to the above solutions ;) ) as this approach reduces the number of comparisons between the list elements a lot. If I have a huge list, I'd definitely go with this and of course you can morph this solution into a lambda function to make it look small:

string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
for item in string_list: 
  for item1 in string_list:
    if item in item1 and item!= item1:
      string_list.remove(item)

print string_list

Output:

>>>['resting', 'looked', 'spit']

Hope it helps !

bhargavkartik · Accepted Answer · 2021-06-02 16:08:17Z

0

Here's an un-optimal way, only use if the lists are small:

for str1 in string_list:
    for str2 in string_list:
        if str1 in str2 and str1 != str2:
            string_list.remove(str1)

edited Jun 2, 2021 at 16:08

bhargavkartik

156 bronze badges

answered Feb 12, 2014 at 6:43

Carl Bellingan

1,49012 silver badges7 bronze badges

Comments

aiwabdn · Accepted Answer · 2017-12-07 11:29:22Z

-1

Here's another way to do it. Assuming you have a sorted list to start with and you don't have to do the sieving inplace, we can just choose the longest strings in one pass:

string_list = sorted(string_list)
sieved = []
for i in range(len(string_list) - 1):
    if string_list[i] not in string_list[i+1]:
        sieved.append(string_list[i])

answered Dec 7, 2017 at 11:29

aiwabdn

2,2401 gold badge15 silver badges7 bronze badges

Collectives™ on Stack Overflow

Python - Remove any element from a list of strings that is a substring of another element

8 Answers 8

1 Comment

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

1 Comment

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related