0

I got a list of links and some of them look like

https://www.domainname
or https://domainname

I need to make a regex pattern to get only the domain name from it. This "www" make problems in my pattern :(

print(re.findall("//([a-zA-Z]+)", i))
2
  • You can create an optional non-capturing group - re.findall(r"//(?:www\.)?([a-zA-Z]+)", i) Commented Sep 2, 2022 at 13:22
  • Maybe stackoverflow.com/questions/44021846/… can help Commented Sep 3, 2022 at 0:05

3 Answers 3

0

You could use the end of the string.

url = "https://www.domainname"
url2 = "https://domainname"


for u in [url, url2]:
    print(f'{u}')
    print(re.findall(r"\w+$", url2))

https://www.domainname
['domainname']
https://domainname
['domainname']
Sign up to request clarification or add additional context in comments.

Comments

0

My solution:

import re

l1 = ["https://www.domainname1", "https://domainname2"]
for i in l1:
    print(re.findall("/(?:www\.)?(\w+)", i))

Output:

['domainname1']
['domainname2']

Comments

0
import re

with open('testfile.txt', 'r') as file:
    readfile = file.read()

    search = re.finditer('(?:\w+:\/\/)?(?:\w+\.)(\w+)(\.\w+)', readfile)

    for check in search:
        print(check.group(1)) #type 1 : if you want only domain names 

result :

domainname
example

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.