Removing duplicates in python list

Question

I have the following list of titles:

titles = ['Saw (US)', 'Saw (AU)', 'Dear Sally (SE)']

How would I get the following:

titles = ['Saw (US)', 'Dear Sally (SE)']

Basically, I need to remove the duplicate titles. It doesn't matter which territory shows, as long as it is on (i.e., I can remove any duplicate).

Here is what I have tried, unsuccessfully:

[title for title in localized_titles if title.split(' (')[0] not in localized_titles]

possible duplicate of How do you remove duplicates from a list in Python whilst preserving order? — Martijn Pieters
– Martijn Pieters, Commented May 14, 2013 at 21:49
@MartijnPieters, I don't think so. Since the items in the list aren't exact duplicates, but they are after some filtering. — noio
– noio, Commented May 14, 2013 at 21:50
@MartijnPieters: This one is unique in that he both needs to remove duplicates, but he has to do it on elements that need some string manipulation run on them. — Jeremy Pridemore
– Jeremy Pridemore, Commented May 14, 2013 at 22:01

Peter DeGlopper · Accepted Answer · 2013-05-14 21:55:43Z

2

I'm not sure this is the most elegant solution, but it should work - you can use your non-territory version of the title as a dict key.

unique_titles = dict((title.rsplit(' (', 1)[0], title) for title in titles)

Or if you need to preserve order, an OrderedDict.

unique_titles.values() would be the titles including territories (one per title).

Using the optional argument to rsplit to limit it to at most one split, and rsplit to start looking for parens from the end rather than beginning of the string.

edited May 14, 2013 at 21:55

answered May 14, 2013 at 21:50

Peter DeGlopper

37.5k7 gold badges95 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David542 · Accepted Answer · 2013-05-14 22:00:16Z

1

Here's a roundabout way of getting there:

localized_titles, existing_stems = [], []
for item in localized:
    stem = item.split(' (')[0]
    if stem not in existing_stems:
        existing_stems.append(stem)
        localized_titles.append(item)

answered May 14, 2013 at 22:00

David542

112k211 gold badges579 silver badges1k bronze badges

Comments

jamylak · Accepted Answer · 2013-05-14 23:14:17Z

1

>>> from collections import OrderedDict
>>> titles = ['Saw (US)', 'Saw (AU)', 'Dear Sally (SE)']
>>> list(OrderedDict((t.rpartition(' (')[0], t) for t in titles).values())
['Saw (AU)', 'Dear Sally (SE)']

answered May 14, 2013 at 23:14

jamylak

134k30 gold badges238 silver badges240 bronze badges

Comments

noio · Accepted Answer · 2013-05-15 09:18:43Z

1

If that is really the exact format of your titles, make sure that your localized_titles is right:

generic_titles = [t.split('(')[0] for t in titles]
titles = [title for title in titles if title.split(' (')[0] not in generic_titles]

But, this all breaks when there are other parentheses in the titles.

edited May 15, 2013 at 9:18

answered May 14, 2013 at 21:48

noio

5,8227 gold badges49 silver badges64 bronze badges

3 Comments

Peter DeGlopper Over a year ago

Good point about other parens - using rsplit with a split limit of 1 might be safe enough?

noio Over a year ago

Yeah that might be safer, perhaps safe enough for OP :)

cmd Over a year ago

doesn't remove the duplicates

Senjai · Accepted Answer · 2013-05-14 21:50:19Z

0

Try using a dictionary to keep track of how many instances of each item in the array you have. Let the key in the dictionary be the value in the array, and the value of dictionary either true or false depending whether it has seen that item yet.

You can then iterate through the array, adding to the dictionary and removing items from the array if they exist in the dictionary. It's how I do it, but I'm still learning.

answered May 14, 2013 at 21:50

Senjai

1,8363 gold badges23 silver badges41 bronze badges

Comments

qwwqwwq · Accepted Answer · 2013-05-14 22:02:41Z

0

For the sake of code golf:

titles = ['('.join(x) for x in dict([x.split('(') for x in titles]).items()]

Assumes only one ( character per title, at the beginning of the country.

answered May 14, 2013 at 22:02

qwwqwwq

7,3592 gold badges32 silver badges51 bronze badges

Comments

cmd · Accepted Answer · 2013-05-14 22:08:16Z

0

fast, and preserves order

seen = set()
[title for title in titles
 if title.split(' (')[0] not in seen and not seen.add(title.split(' (')[0])]

edited May 14, 2013 at 22:08

answered May 14, 2013 at 21:50

cmd

5,88819 silver badges31 bronze badges

Collectives™ on Stack Overflow

Removing duplicates in python list

7 Answers 7

Comments

Comments

Comments

3 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

Comments

Comments

3 Comments

Comments

Comments

Comments

Linked

Related