1

What I want to do is to remove string elements from my list that have some duplicate parts. For example, if I have given list.

ls = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']

I want output as

ls_out = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004']

That is '02/27' already existed in '02/27/1960'.

(note that I'm not sure if this question is duplicated or not)

2
  • Do you only need this to work with dates in that format, or arbitrary strings? Commented Jun 23, 2016 at 1:57
  • Hello @Max Feng, right now, I would like to do in this given format. Thanks! Commented Jun 23, 2016 at 1:59

3 Answers 3

3

This can also be solve with a for loop and any built-in method:

>>> ls
['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']
>>>
>>> ls_out = []
>>> 
>>> for x in ls:
        if not any([x in item for item in ls_out]):
            ls_out.append(x)


>>> ls_out
['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004']

OR:

>>> for x in ls:
        if all([x not in item for item in ls_out]):
            ls_out.append(x)


>>> ls_out
['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004']
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks so much @Iron Fist, this works really well for me!
@titipat .. no pblm .. ;)
Generator comprehension rather than list comprehension with the any approach seems like it would be the most efficient. Also, to me it seems like checking x not in ls_out is overkill. Better to just use: if not any(x in item for item in ls_out):
@juanpa.arrivillaga...correct...it seems redundant as the second condition is a factor for both ...Good observation .
1

I'm not sure if this is the most efficient way to do this, but it would definitely work:

ls = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']

ls2 = ls

for item in ls:
  for dup_item in ls2:
    if item == dup_item:
      continue
    if item.startswith(dup_item):
      _ = ls.pop(ls.index(dup_item))

print ls

Basically, it creates two identical lists, loops through both and checks if they're equal - if they are, it skips. If they aren't, it checks if they start with the other one. If it does, it removes it.

Comments

1
cache = set()
def fun(s):
    ss = s.split('/')
    key = ss[0] + '/' + ss[1]
    if key in cache:
        return None
    else:
        cache.add(key)
        return s

ls = ['02/27/1960', '07/21/2004', '08/13/2004', '09/12/2004', '02/27', '07/21', '08/13']

new_ls = filter(fun, ls)
print new_ls

4 Comments

It makes more sense to use a set as a cache rather than a dictionary with a useless mapping.
First, thanks @atline! @juanpa.arrivillaga I see. So, I have to change cache to empty set and check if key is in cache, I guess.
How about above? Seems set really better than hash.
Yeah, I did use solution above. However, thanks so much @atline :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.