0

My code :-

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)                 
s.connect(("www.python.org" , 80))
s.sendall(b"GET https://www.python.org HTTP/1.0\n\n")
print(s.recv(4096))
s.close()

Why the output shows me this:-

b'HTTP/1.1 500 Domain Not Found\r\nServer: Varnish\r\nRetry-After: 0\r\ncontent-type: text/html\r\nCache-Control: private, no-cache\r\nconnection: keep-alive\r\nContent-Length: 179\r\nAccept-Ranges: bytes\r\nDate: Tue, 11 Jul 2017 15:23:55 GMT\r\nVia: 1.1 varnish\r\nConnection: close\r\n\r\n\n\n\nFastly error: unknown domain \n\n\nFastly error: unknown domain: . Please check that this domain has been added to a service.'

How can I fix it?

4
  • 1
    GET https://www.python.org -- I think you want "GET /" instead. Commented Jul 11, 2017 at 15:37
  • @BrianCain is correct. After the HTTP Verb you should provide the relative path to the resource you wish to access. By connecting to the domain, you're requests are already going through www.python.org. If you continue to have issues, add the Host HTTP Header. Commented Jul 11, 2017 at 15:38
  • When I do this it is shown in plain text? Commented Jul 11, 2017 at 15:48
  • The issue may actually be that the resource in question is accessed over HTTPS. You have to do a bit more work when using a raw socket to connect to a HTTPS service. Commented Jul 11, 2017 at 15:52

2 Answers 2

4

This is wrong on multiple levels:

  • to access a HTTPS resource you need to create a TLS connection (i.e. ssl_wrap on top of an existing TCP connection, with proper certificate checking etc) and then send the HTTP request. Of course the TCP connection in this case should go to port 443(https) not 80 (http).
  • the HTTP request should only contain the path, not the full URL
  • the line end must be \r\n not \n
  • you better send a Host header too since many severs require it

And that's only the request. Properly handling the response is a different topic.

I really really recommend to use an existing library like requests. HTTP(S) is considerably more complex as most think who only had a look at a few traffic captures.

Sign up to request clarification or add additional context in comments.

5 Comments

I highly recommend the requests library instead of raw sockets, unless you want to learn the hard way.
@Ch.Sohaib: Are you asking for sample code for requests: print(requests.get('https://www.python.org').content). Or are you asking how to fix your code: I don't think it is worth since too much is wrong.
@Ch.Sohaib: I use stackoverflow.com more as a way to help others create the right code and learn this way instead of writing code for others. I've pointed out several problems with your code which primarily come from a too small understanding of how HTTP and HTTPS work. I recommend you first improve your understanding of HTTP(S) and try to fix the mentioned problems yourself. If you have specific problems with this I'm willing to help but I don't just write the code for you. I recommend to first start with plain HTTP and if you manage this continue with HTTPS.
Ok no problem bro.
1
import requests
x = requests.get('https://www.python.org')
print x.text

With the requests library, HTTPS requests are very simple! If you're doing this with raw sockets, you have to do a lot more work to negotiate a cipher and etc. Try the above code (python 2.7).

I would also note that, in my experience, Python is excellent for doing things quickly. If you are learning about networking and cryptography, try writing a HTTPS client on your own using sockets. If you want to automate something quickly, use the tools that are available to you. I almost always use requests for this type of task. As an additional note, if you're interested in parsing HTML content, check out the PyQuery library. I've used it to automate interaction with many web services.

Requests

PyQuery

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.