Remove element from list if part of string is duplicate

Question

What I want to do is to remove string elements from my list that have some duplicate parts. For example, if I have given list.

ls = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']

I want output as

ls_out = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004']

That is '02/27' already existed in '02/27/1960'.

(note that I'm not sure if this question is duplicated or not)

Do you only need this to work with dates in that format, or arbitrary strings? — Max Feng
– Max Feng, Commented Jun 23, 2016 at 1:57
Hello @Max Feng, right now, I would like to do in this given format. Thanks! — titipata
– titipata, Commented Jun 23, 2016 at 1:59

Iron Fist · Accepted Answer · 2016-06-23 14:29:15Z

3

This can also be solve with a for loop and any built-in method:

>>> ls
['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']
>>>
>>> ls_out = []
>>> 
>>> for x in ls:
        if not any([x in item for item in ls_out]):
            ls_out.append(x)


>>> ls_out
['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004']

OR:

>>> for x in ls:
        if all([x not in item for item in ls_out]):
            ls_out.append(x)


>>> ls_out
['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004']

edited Jun 23, 2016 at 14:29

answered Jun 23, 2016 at 2:14

Iron Fist

11k2 gold badges20 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

titipata Over a year ago

Thanks so much @Iron Fist, this works really well for me!

Iron Fist Over a year ago

@titipat .. no pblm .. ;)

juanpa.arrivillaga Over a year ago

Generator comprehension rather than list comprehension with the any approach seems like it would be the most efficient. Also, to me it seems like checking x not in ls_out is overkill. Better to just use: if not any(x in item for item in ls_out):

Iron Fist Over a year ago

@juanpa.arrivillaga...correct...it seems redundant as the second condition is a factor for both ...Good observation .

AbrahamB · Accepted Answer · 2016-06-23 02:01:48Z

I'm not sure if this is the most efficient way to do this, but it would definitely work:

ls = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']

ls2 = ls

for item in ls:
  for dup_item in ls2:
    if item == dup_item:
      continue
    if item.startswith(dup_item):
      _ = ls.pop(ls.index(dup_item))

print ls

Basically, it creates two identical lists, loops through both and checks if they're equal - if they are, it skips. If they aren't, it checks if they start with the other one. If it does, it removes it.

atline · Accepted Answer · 2016-06-23 02:16:34Z

1

cache = set()
def fun(s):
    ss = s.split('/')
    key = ss[0] + '/' + ss[1]
    if key in cache:
        return None
    else:
        cache.add(key)
        return s

ls = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']

new_ls = filter(fun, ls)
print new_ls

edited Jun 23, 2016 at 2:16

answered Jun 23, 2016 at 2:03

atline

32.1k19 gold badges102 silver badges134 bronze badges

4 Comments

juanpa.arrivillaga Over a year ago

It makes more sense to use a set as a cache rather than a dictionary with a useless mapping.

titipata Over a year ago

First, thanks @atline! @juanpa.arrivillaga I see. So, I have to change cache to empty set and check if key is in cache, I guess.

atline Over a year ago

How about above? Seems set really better than hash.

titipata Over a year ago

Yeah, I did use solution above. However, thanks so much @atline :)

Collectives™ on Stack Overflow

Remove element from list if part of string is duplicate

3 Answers 3

4 Comments

Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

4 Comments

Related