2

I am programming a script that will grab some data from my website using http GET.

My problem is that i have to pass unicode characters to the website.

I am reading a file that contains these characters and then i try produce a url in order to make the request.

The file is utf-8 encoded and i use this to read from it

f = codecs.open("values.txt", encoding='utf-8')

then i read the first line of the file and i am concatenating the value with the url

sUrl = "http://example.com?word="
value = f.readline()
visitUrl = sUrl + value

if i use print visitUrl the output is correct. i.e http://example.com?word

How to use visiUrl without destroying my special characters? I tried to encode the string to ascii but it doesn't work for all characters.

2 Answers 2

3

Quote the url

import urllib
s = u'Здравей'
urllib.quote(s.encode('utf-8'))
# %D0%97%D0%B4%D1%80%D0%B0%D0%B2%D0%B5%D0%B9

or use urlencode directly to build the query part of the url

urllib.urlencode({'data': s.encode('utf-8')})
# 'data=%D0%97%D0%B4%D1%80%D0%B0%D0%B2%D0%B5%D0%B9'
Sign up to request clarification or add additional context in comments.

2 Comments

should i choose urllib or urllib2?
@messkech: Those functions are in urllib. Don't let the name of urllib2 confuse you that it's an alternative library - it's actually an extension of urllib and both libraries have been merged in Python 3.
1

Build the URL with urllib.urlencode rather than trying to construct it by concatenating strings. Non-ASCII characters in a URL need to be URL encoded.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.