HTTP headers - Requests - Python

Question

I am trying to scrape a website in which the request headers are having some new (for me) attributes such as :authority, :method, :path, :scheme.

{':authority':'xxxx',':method':'GET',':path':'/xxxx',':scheme':'https','accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8','accept-encoding':'gzip, deflate, br','accept-language':'en-US,en;q=0.9','cache-control':'max-age=0',GOOGLE_ABUSE_EXEMPTION=ID=0d5af55f1ada3f1e:TM=1533116294:C=r:IP=182.71.238.62-:S=APGng0u2o9IqL5wljH2o67S5Hp3hNcYIpw;1P_JAR=2018-8-1-9',   'upgrade-insecure-requests': '1',   'user-agent': 'Mozilla/5.0(WindowsNT6.1;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/68.0.3440.84Safari/537.36',   'x-client-data': 'CJG2yQEIpbbJAQjEtskBCKmdygEI2J3KAQioo8oBCIKkygE=' }

I tried passing them as headers with http request but ended up with error as shown below.

ValueError: Invalid header name b':scheme'

Any help would be appreciated on understanding and guidance on using them in passing request.

EDIT: code added

import requests

url = 'https://www.google.co.in/search?q=some+text'

headers = {':authority':'xxxx',':method':'GET',':path':'/xxxx',':scheme':'https','accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8','accept-encoding':'gzip, deflate, br','accept-language':'en-US,en;q=0.9','cache-control':'max-age=0','upgrade-insecure-requests': '1',   'user-agent': 'Mozilla/5.0(WindowsNT6.1;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/68.0.3440.84Safari/537.36',   'x-client-data': 'CJG2yQEIpbbJAQjEtskBCKmdygEI2J3KAQioo8oBCIKkygE=' }

response = requests.get(url, headers=headers)

print(response.text)

Please include your code, so that the error might get reproduced. — Luca Cappelletti
– Luca Cappelletti, Commented Aug 1, 2018 at 10:06
Header names are not supposed to contain colons, since colons are used as a delimiter in headers. — blhsing
– blhsing, Commented Aug 1, 2018 at 10:06
@blhsing Thanks for noticing. But still i did not get the proper response. Can you elaborate about those header attributes? — SanthoshSolomon
– SanthoshSolomon, Commented Aug 1, 2018 at 10:14

Vinay Challuru · Accepted Answer · 2022-03-28 04:28:33Z

2

Your error comes from here (python's source code)

Http headers cannot start with a semicolon as RFC states.

edited Mar 28, 2022 at 4:28

Vinay Challuru

4521 gold badge4 silver badges13 bronze badges

answered Aug 1, 2018 at 10:11

Nikos Vita Topiko

5514 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

SanthoshSolomon Over a year ago

Thanks for your answer. I have removed them and code works fine now. But still i did not get the proper page response yet. Can you help in it?

Nikos Vita Topiko Over a year ago

why do you need these headers?

SanthoshSolomon Over a year ago

I am trying to get the response of the webpage. So trying different methods. I found these headers as strange and thought these might be the reason for not getting the response.

Nikos Vita Topiko Over a year ago

maybe the page is render with javascript so you need something like selenium or github.com/miyakogi/pyppeteer

SanthoshSolomon Over a year ago

Sure. I will check these.

Luk · Accepted Answer · 2018-08-01 10:18:00Z

1

:authority, :method, :path, :scheme are not http headers

https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

':method':'GET'

defines http request method

https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods

and

:authority, :path, :scheme

are parts of URI https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Generic_syntax

answered Aug 1, 2018 at 10:18

Luk

2,2462 gold badges14 silver badges33 bronze badges

2 Comments

SanthoshSolomon Over a year ago

Are they playing any role in getting a web page's response?

Luk Over a year ago

yes, but you are using them elsewhere in your code. requests.get() represents method, and url = 'google.co.in/search?q=some+text' is your URI (www.google.co.in is an authority, https is a schema and /search is a path.

Collectives™ on Stack Overflow

HTTP headers - Requests - Python

2 Answers 2

5 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

2 Comments

Related