9

I'm trying to create a basic link checker in python.

When using the following code:

def get_link_response_code(link_to_check):  
    resp = requests.get(link_to_check)
    return resp.status_code

I'm always getting the right response code but it takes considerable ammount of time.

But when using this code: (requests.get replaced with requests.head)

def get_link_response_code(link_to_check):  
    resp = requests.head(link_to_check)
    return resp.status_code

It usually works, and very fast, but sometimes return HTTP 405 (for a link which is not really broken).

Why am I getting 405 (wrong method) errors? what can I do to quickly check for broken links? Thanks.

2
  • This link would be useful Commented Jan 4, 2015 at 7:41
  • It looks like one of the proxies/servers on the "current" route to that (valid!) resource is configured not to accept the HEAD method. Nothing to do with the code itself... Commented Jan 4, 2015 at 8:49

3 Answers 3

9

According to the specification, 405 means that Method not allowed which means that you cannot use HEAD for this particular resource.

Handle it and use get() in these cases:

def get_link_response_code(link_to_check):
    resp = requests.head(link_to_check)
    if resp.status_code == 405:
        resp = requests.get(link_to_check)
    return resp.status_code

As a side note, you may not need to make an additional get() since 405 is kind of a "good" error - the resource exists, but not available with HEAD. You may also check the Allow response header value which must be set in response from your HEAD request:

The Allow entity-header field lists the set of methods supported by the resource identified by the Request-URI. The purpose of this field is strictly to inform the recipient of valid methods associated with the resource. An Allow header field MUST be present in a 405 (Method Not Allowed) response.

Sign up to request clarification or add additional context in comments.

2 Comments

as a side note, servers which configured to disable HEAD method are generally bad practices.
Great answer, thanks. I would use your change of the code but your side note is wrong - I tried going to another page on this "blocking" website, i.e: www.domain-with-405.com/non-existent/ and by browser I'm getting a 404 error, but from the code I still get 405. So if I want to check if a specific page exists I must use the get function in those cases. Thanks again.
2

For requests.get your are getting the info correctly because the GET method means retrieve whatever information (in the form of an entity) is identified by the Request-URI while the requests.Head the server doesn't return message body the in the response.

Please note that the HEAD method is identical to GET except that the server MUST NOT return a message-body in the response.

Comments

0

If you are trying to Crawl some webpage, your request maybe GET method and it should return 200 if it OK, but maybe some conf not allow the GET method from program for some season, you can just add some code like this:

def get_link_response_code(link_to_check):
  try:
    resp = requests.head(link_to_check)
    if resp.status_code != 200:
      print "error"
    else:
      reutrun resp.status_code
  except Exception,error:
    print error

  return None

hope that helps!

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.