I am looking to make a function to break a string into a list of str by breaking it at various punctuation points (e.g. , ! ?) that I specify. I know I should used the .split() function with the specific punctuation, however I can't figure out how to get iterate running the split with each punctuation character specified to produce a single list of str with made up from the original str split at every punctuation character.
1 Answer
To split with multiple delimiters, you should use re.split():
import re
pattern = r"[.,!?]" # etc.
new = re.split(pattern, your_current_string)
Putting that in function form should be simple enough.
4 Comments
El Bert
Using your method I get a list of empty strings.
re.split(pattern, "Hello!I'd like, to say something. 'World'.") returns '["", "", "", "", "", "", ""]'anon582847382
@bvidal That's because I forgot to escape the full stop (which meant it was splitting on everything); thanks for telling me. Try it again now.
Yann Vernier
It's probably a better idea to write the regex directly (
pattern = r"[.,!?]"), or use re.escape: pattern='|'.join(map(re.escape, delimiters)).anon582847382
@YannVernier I'd say that's definitely a better idea. Edited.