Python Regex URL from Website / Raw Website without Https & Http [duplicate]

Question

I have python code like this

#! /usr/bin/python
from url parse import urlparse
url = 'https://pastebin.com/raw/EgGZmEqY'
parsed = urlparse(url)
site = parsed.netloc
print site

I want if the site is RAW or NOT just Grabbing the site without HTTPS and HTTP or WWW. For Example i have website like this from RAW. I want to get the URL just example.com without

https://example.com
http://example.com
www.example.com
example.com

How to get without https,http and www ? Thank you!

kenjoe41 · Accepted Answer · 2018-09-15 11:29:53Z

1

I take it that you just want the TLD (domain name) without the subdomains or scheme.

From this Stackoverflow answer, seems all you need is:

import tldextract
tldextract.extract('http://forums.news.cnn.com/')
ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')

In your case then, i would use this: #!/usr/bin/env python3

import tldextract

url = 'https://www.pastebin.co.uk/raw/EgGZmEqY'

parsed = tldextract.extract(url)
domain = parsed.domain + '.' + parsed.suffix



print (domain)

edited Sep 15, 2018 at 11:29

answered Sep 15, 2018 at 10:16

kenjoe41

2602 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tim Biegeleisen Over a year ago

You should provide code which works with the OP's exact data. Cutting and pasting from another question doesn't help much.

Rai Over a year ago

But that just for one domain .. how i want grab it from raw / another website ? like in my pastebin link.

Collectives™ on Stack Overflow

Python Regex URL from Website / Raw Website without Https & Http [duplicate]

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related