How to delete multiple substrings?

Question

I'm working on a script that get some information from a PGN file, a format used to describe chess games. I'm trying to copy the moves of each game separately in another file.

But sometimes, there are comments, marked by '{' and '}' characters, and I would like to strip them from the string (I'm copying each line of the file into a string to make some adjustments before writing on the output file).

An example of a string in this format would be:

'1.e4 {some comment} c5 2.Nf3 d6 3.d4 {another comment} Nxd4 {you got it}'

My first solution was simply:

my_string = my_string.replace(my_string[my_string.find('{'):my_string.find('}')], '')

Unfortunately, this stripped just the first set of comments, like this:

'1.e4 } c5 2.Nf3 d6 3.d4 {another comment} Nxd4 {you got it}'

(the '}' that remained is not a problem, it can be deleted with:

my_string = my_string.replace('}', '')

So I tried to loop over the string:

for char in my_string:
    if char == '{':
        my_string = my_string.replace(my_string[my_string.find('{'):my_string.find('}')], '')

The very same thing happened, only the first set of comments was deleted.

Then I tried a while loop:

while my_string.find('{') != -1:
    my_string = my_string.replace(my_string[my_string.find('{'):my_string.find('}')], '')

And now I am stuck in an infinite loop...

Anyone knows how to solve this? I would accept a solution with lists too, which I could embed inside:

temp_list = list(my_string)
#solution with list manupulation
my_string = ''.join(temp_list)

What's your expected output? Why don't use re.sub? re.sub(r'\{[^}]*}', '', my_string) — Avinash Raj
– Avinash Raj, Commented Mar 24, 2015 at 4:39
Everything but comments inside { and }, these characters included. Following the example, I expect '1.e4 c5 2.Nf3 d6 3.d4 Nxd4 ' — Cícero Vargas
– Cícero Vargas, Commented Mar 24, 2015 at 4:42

Amadan · Accepted Answer · 2015-03-24 04:42:54Z

3

Regular expressions are perfect for this.

import re
re.sub(r'\s*{.*?}\s*', ' ', my_string)
# '1.e4 c5 2.Nf3 d6 3.d4 Nxd4 '

"replace any number of whitespace, an open curly, the least possible amount of anything at all (except newlines) followed by a closed curly and any amount of whitespace with a single space"

answered Mar 24, 2015 at 4:42

Amadan

199k23 gold badges252 silver badges321 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jonathan Eunice Over a year ago

Nice use of *?, the ever-handy non-greedy collector. Short, sweet, and done.

Cícero Vargas Over a year ago

Thank you Amadan, but it is not working, it is not deleting a single character. I didn't studied regular expressions yet so I don't know what to do. Maybe you missed something?

Amadan Over a year ago

I did not; check here (click Run). Maybe you missed something? Such as an assignment? (The above snippet will calculate the new string, but does not assign it to anything; and the old string is unchanged, of course, as Python's strings are immutable.)

Cícero Vargas Over a year ago

Oooh the assignment! Yes, now it is working! Thank you very much!!

Ming · Accepted Answer · 2015-03-24 04:47:32Z

As an additional remark to the other answer, if you are parsing a complex format (as PGN is, among many others), you should look into using a general-purpose parsing library, rather than writing your own ad-hoc parser. That will allow you to re-use shared logic that the library authors have written and debugged for you. Parsing is an extreme example of a use-case which has undergone a tremendous amount of research over the years, and by utilizing the proper library, you can benefit from this research in your own projects. This list on the official Python wiki suggests many possible options. This blog post offers a review of some popular options.

Thank you Ming, I will surely read your links. In fact, I even found a pgnparser package already on PyPI! I will try to study its code.

Community · Accepted Answer · 2017-05-23 12:10:15Z

0

Note that your attempts leave the the final } in place. This is because the my_string.find('}') returns the index of the }, but the replace function replaces everything up to but not including the index.

So, you need to increment the end index by 1:

my_string = my_string.replace(my_string[my_string.find('{'):my_string.find('}')+1], '')

As @Amadan's answer suggests, I'd probably just use regular expressions for this exercise.

edited May 23, 2017 at 12:10

CommunityBot

11 silver badge

answered Mar 24, 2015 at 4:56

David C

7,5847 gold badges50 silver badges66 bronze badges

Collectives™ on Stack Overflow

How to delete multiple substrings?

3 Answers 3

4 Comments

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

Comments

Related