Return to Answer

added 12 characters in body

Source Link

edited Jul 28, 2018 at 19:15

113.1k
15
154
396

There are a couple of minor technical issues:

The content variable is unnecessary, because you can simply return html_page.read() directly. (And you could as well return urlopen(req, timeout=10).read() directly...) When the max attempts are reached, you could return "" instead of relying on that content was initialized to "". And how about returning None? Then you could simply omit the return statement to the same effect.
In the exception handling, there are multiple if statements with conditions that are mutually exclusive, only one can match at a time. In such situation you should chain them together with elif.
Instead of doing a single except statement with multiple error types and then using conditionals to identify the correct one, it would be better to use multiple except statements each with a single error type.
You could iterate using range for slightly more compact code.

Like this:

def get_html_content(url, max_attempt = 3):
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

    for attempt in range(max_attempt):
        try:
            return urlopen(req, timeout=10).read()
        except HTTPError as e:
            print("The server couldn\'t fulfill the request....attempt %d{}/%d" % {}".format(attempt + 1, max_attempt))
            print('Error code: ', e.code)
        except URLError as e:
            print("We failed to reach a server....attempt %d{}/%d" % {}".format(attempt + 1, max_attempt))
            print('Reason: ', e.reason)
        except timeout as e:
            print('timeout...attempt %d{}/%d' % {}'.format(attempt + 1, max_attempt))

There are a couple of minor technical issues:

The content variable is unnecessary, because you can simply return html_page.read() directly. (And you could as well return urlopen(req, timeout=10).read() directly...) When the max attempts are reached, you could return "" instead of relying on that content was initialized to "". And how about returning None? Then you could simply omit the return statement to the same effect.
In the exception handling, there are multiple if statements with conditions that are mutually exclusive, only one can match at a time. In such situation you should chain them together with elif.
Instead of doing a single except statement with multiple error types and then using conditionals to identify the correct one, it would be better to use multiple except statements each with a single error type.
You could iterate using range for slightly more compact code.

Like this:

def get_html_content(url, max_attempt = 3):
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

    for attempt in range(max_attempt):
        try:
            return urlopen(req, timeout=10).read()
        except HTTPError as e:
            print("The server couldn\'t fulfill the request....attempt %d/%d" % (attempt + 1, max_attempt))
            print('Error code: ', e.code)
        except URLError as e:
            print("We failed to reach a server....attempt %d/%d" % (attempt + 1, max_attempt))
            print('Reason: ', e.reason)
        except timeout as e:
            print('timeout...attempt %d/%d' % (attempt + 1, max_attempt))

There are a couple of minor technical issues:

The content variable is unnecessary, because you can simply return html_page.read() directly. (And you could as well return urlopen(req, timeout=10).read() directly...) When the max attempts are reached, you could return "" instead of relying on that content was initialized to "". And how about returning None? Then you could simply omit the return statement to the same effect.
In the exception handling, there are multiple if statements with conditions that are mutually exclusive, only one can match at a time. In such situation you should chain them together with elif.
Instead of doing a single except statement with multiple error types and then using conditionals to identify the correct one, it would be better to use multiple except statements each with a single error type.
You could iterate using range for slightly more compact code.

Like this:

def get_html_content(url, max_attempt = 3):
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

    for attempt in range(max_attempt):
        try:
            return urlopen(req, timeout=10).read()
        except HTTPError as e:
            print("The server couldn\'t fulfill the request....attempt {}/{}".format(attempt + 1, max_attempt))
            print('Error code: ', e.code)
        except URLError as e:
            print("We failed to reach a server....attempt {}/{}".format(attempt + 1, max_attempt))
            print('Reason: ', e.reason)
        except timeout as e:
            print('timeout...attempt {}/{}'.format(attempt + 1, max_attempt))

Source Link

answered Jul 28, 2018 at 14:07

janos

113.1k
15
154
396

There are a couple of minor technical issues:

The content variable is unnecessary, because you can simply return html_page.read() directly. (And you could as well return urlopen(req, timeout=10).read() directly...) When the max attempts are reached, you could return "" instead of relying on that content was initialized to "". And how about returning None? Then you could simply omit the return statement to the same effect.
In the exception handling, there are multiple if statements with conditions that are mutually exclusive, only one can match at a time. In such situation you should chain them together with elif.
Instead of doing a single except statement with multiple error types and then using conditionals to identify the correct one, it would be better to use multiple except statements each with a single error type.
You could iterate using range for slightly more compact code.

Like this:

def get_html_content(url, max_attempt = 3):
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

    for attempt in range(max_attempt):
        try:
            return urlopen(req, timeout=10).read()
        except HTTPError as e:
            print("The server couldn\'t fulfill the request....attempt %d/%d" % (attempt + 1, max_attempt))
            print('Error code: ', e.code)
        except URLError as e:
            print("We failed to reach a server....attempt %d/%d" % (attempt + 1, max_attempt))
            print('Reason: ', e.reason)
        except timeout as e:
            print('timeout...attempt %d/%d' % (attempt + 1, max_attempt))