2

I have the following list of titles:

titles = ['Saw (US)', 'Saw (AU)', 'Dear Sally (SE)']

How would I get the following:

titles = ['Saw (US)', 'Dear Sally (SE)']

Basically, I need to remove the duplicate titles. It doesn't matter which territory shows, as long as it is on (i.e., I can remove any duplicate).

Here is what I have tried, unsuccessfully:

[title for title in localized_titles if title.split(' (')[0] not in localized_titles]
5
  • 2
    possible duplicate of How do you remove duplicates from a list in Python whilst preserving order? Commented May 14, 2013 at 21:49
  • @MartijnPieters, I don't think so. Since the items in the list aren't exact duplicates, but they are after some filtering. Commented May 14, 2013 at 21:50
  • 1
    @Noio: the same techniques apply. Commented May 14, 2013 at 21:56
  • @MartijnPieters: This one is unique in that he both needs to remove duplicates, but he has to do it on elements that need some string manipulation run on them. Commented May 14, 2013 at 22:01
  • does order need to be preserved? Commented May 14, 2013 at 22:11

7 Answers 7

2

I'm not sure this is the most elegant solution, but it should work - you can use your non-territory version of the title as a dict key.

unique_titles = dict((title.rsplit(' (', 1)[0], title) for title in titles)

Or if you need to preserve order, an OrderedDict.

unique_titles.values() would be the titles including territories (one per title).

Using the optional argument to rsplit to limit it to at most one split, and rsplit to start looking for parens from the end rather than beginning of the string.

Sign up to request clarification or add additional context in comments.

Comments

1

Here's a roundabout way of getting there:

localized_titles, existing_stems = [], []
for item in localized:
    stem = item.split(' (')[0]
    if stem not in existing_stems:
        existing_stems.append(stem)
        localized_titles.append(item)

Comments

1
>>> from collections import OrderedDict
>>> titles = ['Saw (US)', 'Saw (AU)', 'Dear Sally (SE)']
>>> list(OrderedDict((t.rpartition(' (')[0], t) for t in titles).values())
['Saw (AU)', 'Dear Sally (SE)']

Comments

1

If that is really the exact format of your titles, make sure that your localized_titles is right:

generic_titles = [t.split('(')[0] for t in titles]
titles = [title for title in titles if title.split(' (')[0] not in generic_titles]

But, this all breaks when there are other parentheses in the titles.

3 Comments

Good point about other parens - using rsplit with a split limit of 1 might be safe enough?
Yeah that might be safer, perhaps safe enough for OP :)
doesn't remove the duplicates
0

Try using a dictionary to keep track of how many instances of each item in the array you have. Let the key in the dictionary be the value in the array, and the value of dictionary either true or false depending whether it has seen that item yet.

You can then iterate through the array, adding to the dictionary and removing items from the array if they exist in the dictionary. It's how I do it, but I'm still learning.

Comments

0

For the sake of code golf:

titles = ['('.join(x) for x in dict([x.split('(') for x in titles]).items()]

Assumes only one ( character per title, at the beginning of the country.

Comments

0

fast, and preserves order

seen = set()
[title for title in titles
 if title.split(' (')[0] not in seen and not seen.add(title.split(' (')[0])]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.