103

I have an url like:
http://abc.hostname.com/somethings/anything/

I want to get:
hostname.com

What module can I use to accomplish this?
I want to use the same module and method in python2.

4
  • 3
    I would imaging you could use regex. Commented May 22, 2017 at 12:52
  • 2
    You can just use str.split(), it's easy Commented May 22, 2017 at 12:54
  • url.split('/')[2] will give you 'abc.hostname.com' you can extract it using split or re any method. Commented May 22, 2017 at 12:58
  • 3
    maybe a duplicate, but better answers here Commented Mar 2, 2022 at 5:03

5 Answers 5

155

For parsing the domain of a URL in Python 3, you can use:

from urllib.parse import urlparse

domain = urlparse('http://www.example.test/foo/bar').netloc
print(domain) # --> www.example.test

However, for reliably parsing the top-level domain (example.test in this example), you need to install a specialized library (e.g., tldextract).

Sign up to request clarification or add additional context in comments.

Comments

76

Instead of regex or hand-written solutions, you can use python's urlparse

from urllib.parse import urlparse

print(urlparse('http://abc.hostname.com/somethings/anything/'))
>> ParseResult(scheme='http', netloc='abc.hostname.com', path='/somethings/anything/', params='', query='', fragment='')

print(urlparse('http://abc.hostname.com/somethings/anything/').netloc)
>> abc.hostname.com

To get without the subdomain

t = urlparse('http://abc.hostname.com/somethings/anything/').netloc
print ('.'.join(t.split('.')[-2:]))
>> hostname.com

7 Comments

In Python3 the lib urlparse was renamed to urllib.parse.
will it work with something like test.mytest.example.com ?
It will fail with *.co.uk or *.ac.uk domains.
t.split('.')[-2:] literally keeps only the last two substrings, so I am afraid it will just return co.uk and ac.uk, whether you prepend that or not.
This (wrong due to the mentioned reasons) answer has so many up-votes and then we wonder why different software and websites have so many bugs...
|
38

You can use tldextract.

Example code:

from tldextract import extract
tsd, td, tsu = extract("http://abc.hostname.com/somethings/anything/") # prints abc, hostname, com
url = td + '.' + tsu # will prints as hostname.com    
print(url)

2 Comments

tldextract is not a standard lib ( at least not in python 2.7 ) , I think you should mention that. Still +1
Works well! But, getting No handlers could be found for logger "tldextract", how to handle this.
5

Assuming you have it in an accessible string, and assuming we want to be generic for having multiple levels on the top domain, you could:

token=my_string.split('http://')[1].split('/')[0]
top_level=token.split('.')[-2]+'.'+token.split('.')[-1]

We split first by the http:// to remove that from the string. Then we split by the / to remove all directory or sub-directory parts of the string, and then the [-2] means we take the second last token after a ., and append it with the last token, to give us the top level domain.

There are probably more graceful and robust ways to do this, for example if your website is http://.com it will break, but its a start :)

4 Comments

your code can be simplified more token=my_string.split('/')[2] though it will also work for ftp:// and https:// also.
That is valid feedback :)
@Gahan that's better but doesn't work on file: urls, which usually start with file:///. try token = url.split (':') [1].lstrip ('/').split ('/') [0]. at least that grabs hostname portion. as a bonus it also removes port number if present, which these answers don't. still have issues with parsing .co.uk domains.
@Ed_ file:/// is for local files, in which case, use-case and implementation should have been carefully handled as that is the local files only and does not need to grab any kind of domain from it.
-5

Try:

from urlparse import urlparse

parsed = urlparse('http://abc.hostname.com/somethings/anything/')
domain = parsed.netloc.split(".")[-2:]
host = ".".join(domain)
print host  # will prints hostname.com

1 Comment

won't work with .co.uk

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.