www.emsdn.com
Class Profile: Home »» Python [Python] under "Python" »»» i18n on Entry widgets

i18n on Entry widgets


K this is actually starting to make sense :-) Here is what I think is happening:
You get different results in the IDE and the console because they are using different encodings. The IDE is using utf-8 so the params are encoded in utf-8. The console is using latin-1 and you get encoded latin-1 params.
When you use babelfish from the browser it gets a page in utf-8 and sends the parameters back the same way, but probably with a header saying it is utf-8. When you use urllib you don't tell it the encoding so it is assuming latin-1, that's why the interpreter version works.
So in your GUI version if you get utf-8 from the GUI, you can convert it to latin-1 by
phrase.decode('utf-8').encode('latin-1') as long as your text can be expressed in latin-1. If you need utf-8 then you have to figure out how to tell babelfish that you are sending utf-8.
Kent
PS please reply to the list not to me personally.
Jorge Louis de Castro wrote:
Thanks again,
I'm sorry to be such a PITB but this is driving me insane! the code
below easily connects to babelfish and returns a translated string.
__where = [ re.compile(r'name=\"q\">([^<]*)'),
re.compile(r'td bgcolor=white>([^<]*)'),
re.compile(r'td bgcolor=white class=s><div
style=padding:10px;>([^<]*)'),
re.compile(r'<\/strong><br>([^<]*)')
def clean(text):
return ' '.join(string.replace(text.strip(), "\n", ' ').split())
def translateByCode(phrase, from_code, to_code):
phrase = clean(phrase)
params = urllib.urlencode( { 'BabelFishFrontPage' : 'yes',
'doit' : 'done',
'urltext' : phrase,
'lp' : from_code + '_' + to_code } )
print "URL encoding ", params
try:
response =
urllib.urlopen('', params)
except IError, what:
print "ERRRR TRANSLATING ", what
except:
print "Unexpected error:", sys.exc_info()[0]
html = response.read()
for regex in __where:
match = regex.search(html)
if match: break
if not match: print "ERRR MATCHING"
return clean(match.group(1))
if __name__ == '__main__':
print translateByCode('', 'pt', 'en')
If I run this through the Run option on the IDE I get the following output:
URL encoding doit=done&urltext=ent%C3%
o
o
If I import this module on the interpreter and then call
print translateByCode('', 'en', 'pt')
I get:
URL encoding doit=done&urltext=ent%
then
then
Now the urllib encoding of the urltext IS different ("ent%C3%A3o" VS
"ent%E3o") even though I'm passing the same stuff!
And this works fine except when I use special characters and I don't
know how to use the utf-8 encoding to get this working -i know altavista
uses utf-8 because they also translate chinese.
Thanks again and sorry for the blurb but i ran out of solutions for this
one.
Tutor maillist - Tutor (AT) python (DOT) org


No. 1# | By Developer Tags User at [2008-5-10] | size: 5542 bytes

I got it working with a utf-8 query by adding an Accept-Charset header to the request. I used the 'Tamper Data' add-on to Firefox to view all the request headers being sent by the browser. I added all the same headers to the Python request and it worked. Then I took out the headers until I found the needed one. Here is a stripped-down version of your code that posts a word encoded in utf-8 and gets the correct response. I also changed the post parameters a little to match what I am seeing in my browser:

import re, urllib, urllib2

__where = [ re.compile(r'name=\"q\">([^<]*)'),
re.compile(r'td bgcolor=white>([^<]*)'),
re.compile(r'td bgcolor=white class=s><div style=padding:10px;>([^<]*)'),
re.compile(r'<\/strong><br>([^<]*)')
]

phrase = 'ent\xc3\xa3o'
params = urllib.urlencode( { 'doit' : 'done',
'tt' : 'urltext',
'trtext' : phrase,
'intl' : 1,
'lp' : 'pt_en' } )
print "URL encoding ", params

req = urllib2.Request('')

req.add_header('Accept-Charset', 'IS,utf-8;q=0.7,*;q=0.7')

response = urllib2.urlopen(req, params)

html = response.read()
for regex in __where:
match = regex.search(html)
if match:
print match.group(1)
break
else:
print "ERRR MATCHING"
print html

Kent

Kent Johnson wrote:
K this is actually starting to make sense :-) Here is what I think is happening:

You get different results in the IDE and the console because they are using different encodings. The IDE is using utf-8 so the params are encoded in utf-8. The console is using latin-1 and you get encoded latin-1 params.

When you use babelfish from the browser it gets a page in utf-8 and sends the parameters back the same way, but probably with a header saying it is utf-8. When you use urllib you don't tell it the encoding so it is assuming latin-1, that's why the interpreter version works.

So in your GUI version if you get utf-8 from the GUI, you can convert it to latin-1 by
phrase.decode('utf-8').encode('latin-1') as long as your text can be expressed in latin-1. If you need utf-8 then you have to figure out how to tell babelfish that you are sending utf-8.

Kent

PS please reply to the list not to me personally.

Jorge Louis de Castro wrote:

>>Thanks again,
>>
>>I'm sorry to be such a PITB but this is driving me insane! the code
>>below easily connects to babelfish and returns a translated string.
>>

where = [ re.compile(r'name=\"q\">([^<]*)'),
>re.compile(r'td bgcolor=white>([^<]*)'),
>re.compile(r'td bgcolor=white class=s><div
>>style=padding:10px;>([^<]*)'),

>re.compile(r'<\/strong><br>([^<]*)')
>>
>>def clean(text):

>return ' '.join(string.replace(text.strip(), "\n", ' ').split())
>>
>>def translateByCode(phrase, from_code, to_code):

>phrase = clean(phrase)
>params = urllib.urlencode( { 'BabelFishFrontPage' : 'yes',
>'doit' : 'done',
>'urltext' : phrase,
>'lp' : from_code + '_' + to_code } )
>print "URL encoding ", params
>try:
>response =
>>urllib.urlopen('', params)

>except IError, what:
>print "ERRRR TRANSLATING ", what
>except:
>print "Unexpected error:", sys.exc_info()[0]
>>

>html = response.read()
>for regex in __where:
>match = regex.search(html)
>if match: break
>if not match: print "ERRR MATCHING"
>return clean(match.group(1))
>>
>>if __name__ == '__main__':

>print translateByCode('', 'pt', 'en')
>>
>>If I run this through the Run option on the IDE I get the following output:
>>
>>URL encoding doit=done&urltext=ent%C3%
>>o
>>o
>>
>>If I import this module on the interpreter and then call
>>
>>print translateByCode('', 'en', 'pt')
>>
>>I get:
>>
>>URL encoding doit=done&urltext=ent%
>>then
>>then
>>
>>Now the urllib encoding of the urltext IS different ("ent%C3%A3o" VS
>>"ent%E3o") even though I'm passing the same stuff!
>>And this works fine except when I use special characters and I don't
>>know how to use the utf-8 encoding to get this working -i know altavista
>>uses utf-8 because they also translate chinese.
>>
>>Thanks again and sorry for the blurb but i ran out of solutions for this
>>one.
>>
>>
>>


Tutor maillist - Tutor (AT) python (DOT) org

Tutor maillist - Tutor (AT) python (DOT) org



Python Hot!

Python New!


Copyright © 2008 www.emsdn.com • All rights reserved • CMS Theme by www.emsdn.com - 0.422