0

In my Python script regarding URL html.text parsing, the input to my application is fixed i.e the domain name.

However I need to store and process that domain name into its URL format. I feel it is not advisable to simply prepend 'https://' to the domain name for the purpose.

As seen below, URL pasring fails because it is receives a domain format not a URL.

from urllib.request import Request, urlopen
import requests

url = 'xyz.com' # it is a domain name. But requires it to be in URL format to perform further parsing.

# Option 1
html=urlopen(url).read()

# Option 2
resp = requests.get(url)
html = resp.text

# Error encountered: Invalid URL.

What is a good way to convert a domain name to its URL format?

0

1 Answer 1

1

If you want to find out whether "http://"+url or "https://"+url is working, you could just check both:

from urllib.request import urlopen
from urllib.error import URLError

url = 'yourpage.com'
try:
  html=urlopen("https://"+url).read()
except URLError:
  html=urlopen("http://"+url).read()
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.