import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4e.com', 80))
mysock.send('GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n')
while True:
data = mysock.recv(512)
if(len(data) < 1):
break
print (data)
mysock.close()
Quite simple, don't use http:// in your host declaration on .connect().
http:// is a protocol and www.py4e.com is a host (or A record in a DNS server). The standard socket library doesn't know anything regarding protocols and there for requires only a host and a port number.
If you want automated processes check out urllib.request or @Mego's answer using Requests which handles the connection and HTTP parsing for you.
Also if you're using Python3 which you probably should, you need to send bytes data when doing .send().
There's two ways of converting your string to bytes data:
mysock.send(b'GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n')
mysock.send(bytes('GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n', 'UTF-8'))
Both does the same thing basically.
Finally, in a GET request you don't request http:// either.
Instead you just send the path to the file you want to retrieve:
mysock.send(b'GET /code3/mbox-short.txt HTTP/1.0\n\n')
The reason is (again) that http:// is a protocol descriptor and not part of the actual protocol data being sent. You also don't need the host declaration in your GET request because the server that you connected to already knows which host you're on - since you're... connected to it.
Instead the server expects you to supply a Host: <hostname>\r\n header if the host is serving multiple virtual hosts.
You might need a few other headers tho to be able to request actual content from certain web-servers.
But this is the basic jist of things.
Continue reading
Here's a good start:
It shows you what a raw GET request looks like.
An in the future I recommend using your browsers built-in Network Debugger which can show raw headers, raw responses and a whole bunch of other things.