String regex for a particular format in Python

Question

I have strings in the following format name1 <[email protected]>. How can I use regex to pull only the name1 part out? Also, how might I be able to do this if I had multiple such names and emails, say name1 <[email protected]>, name2 <[email protected]>?

Are the emails actually surrounded by < >?

DeepSpace
– DeepSpace

2021-02-25 18:16:23 +00:00
Commented Feb 25, 2021 at 18:16 — DeepSpace
– DeepSpace, Commented Feb 25, 2021 at 18:16
@DeepSpace yes, they are.

supersaiyajin87
– supersaiyajin87

2021-02-25 18:21:56 +00:00
Commented Feb 25, 2021 at 18:21 — supersaiyajin87
– supersaiyajin87, Commented Feb 25, 2021 at 18:21

Mayank Porwal · Accepted Answer · 2021-02-25 18:17:59Z

3

Try using split:

In [164]: s = 'name1 <[email protected]>, name2 <[email protected]>'
In [166]: [i.split()[0] for i in s.split(',')]
Out[166]: ['name1', 'name2']

If you have just one name:

In [161]: s = 'name1 <[email protected]>'
In [163]: s.split()[0]
Out[163]: 'name1'

edited Feb 25, 2021 at 18:17

answered Feb 25, 2021 at 18:17

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ti7 Over a year ago

I thought to do this too, but it only works if there are not additional spaces, which I doubt is guaranteed; indeed a regex about < would be much cleaner!

Mayank Porwal Over a year ago

I don't think the emails are surrounded by <>. Its just a way of representing.

C.Nivs Over a year ago

@ti7 str.split will behave the same if there are multiple spaces. 'name1 <someemail>'.split() returns the same output as 'name1 <someemail>'.split()

Mayank Porwal Over a year ago

Yes, str.split by default handles whitespaces.

ti7 Over a year ago

obviously, but it will not work if the structure is like first, last <[email protected]> (which "real" emails frequently are) - the OP did not state this, but it'll be the case for any real-world collection

|

DeepSpace · Accepted Answer · 2021-02-25 18:35:15Z

2

You can start with (\w+)\s<.*?>(?:,\s)? (see on regex101.com), which relies on the fact that emails are surrounded by < >, and customize it as you see fit.

Note that this regex does not specifically look for emails, just for text surrounded by < >.

Don't fall down the rabbit hole of trying to specifically match emails.

import re

regex = re.compile(r'(\w+)\s<.*?>(?:,\s)?')
string = 'name1 <[email protected]>, name2 <[email protected]>'

print([match for match in regex.findall(string)])

outputs

['name1', 'name2']

edited Feb 25, 2021 at 18:35

answered Feb 25, 2021 at 18:27

DeepSpace

82.1k12 gold badges119 silver badges166 bronze badges

1 Comment

supersaiyajin87 Over a year ago

This actually works better for me, thank you.

nahar · Accepted Answer · 2021-02-25 19:46:01Z

import re

name = re.search(r'(?<! <)\w+', 'name1 <[email protected]>')

print(name.group(0))

>>> name1

Explanation:

(?<!...) is called a negative lookbehind assertion. I added ' <' into the ... as you are looking for the string that precedes the '<' of the email.

re.search(r'(?<!...), string_to_search)

https://docs.python.org/3/library/re.html

Edit/Forgot:

To search strings with multiple:

import re

regex = r"\w+([?<! <])"

multi_name = "name1 <[email protected]>, name2 <[email protected]>"
    
matches = re.finditer(regex, multi_name, re.MULTILINE)
    
for group, match in enumerate(matches, start=1):
    print(f"Match: {match.group()}")

>>> name1
>>> name2

Collectives™ on Stack Overflow

String regex for a particular format in Python

3 Answers 3

7 Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

1 Comment

Comments

Linked

Related