Python

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • i18n on Entry widgets

    1 answers - 3232 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    K this is actually starting to make sense :-) Here is what I think is happening:
    You get different results in the IDE and the console because they are using different encodings. The IDE is using utf-8 so the params are encoded in utf-8. The console is using latin-1 and you get encoded latin-1 params.
    When you use babelfish from the browser it gets a page in utf-8 and sends the parameters back the same way, but probably with a header saying it is utf-8. When you use urllib you don't tell it the encoding so it is assuming latin-1, that's why the interpreter version works.
    So in your GUI version if you get utf-8 from the GUI, you can convert it to latin-1 by
    phrase.decode('utf-8').encode('latin-1') as long as your text can be expressed in latin-1. If you need utf-8 then you have to figure out how to tell babelfish that you are sending utf-8.
    Kent
    PS please reply to the list not to me personally.
    Jorge Louis de Castro wrote:
    Thanks again,
    I'm sorry to be such a PITB but this is driving me insane! the code
    below easily connects to babelfish and returns a translated string.
    __where = [ re.compile(r'name=\"q\">([^<]*)'),
    re.compile(r'td bgcolor=white>([^<]*)'),
    re.compile(r'td bgcolor=white class=s><div
    style=padding:10px;>([^<]*)'),
    re.compile(r'<\/strong><br>([^<]*)')
    def clean(text):
    return ' '.join(string.replace(text.strip(), "\n", ' ').split())
    def translateByCode(phrase, from_code, to_code):
    phrase = clean(phrase)
    params = urllib.urlencode( { 'BabelFishFrontPage' : 'yes',
    'doit' : 'done',
    'urltext' : phrase,
    'lp' : from_code + '_' + to_code } )
    print "URL encoding ", params
    try:
    response =
    urllib.urlopen('', params)
    except IError, what:
    print "ERRRR TRANSLATING ", what
    except:
    print "Unexpected error:", sys.exc_info()[0]
    html = response.read()
    for regex in __where:
    match = regex.search(html)
    if match: break
    if not match: print "ERRR MATCHING"
    return clean(match.group(1))
    if __name__ == '__main__':
    print translateByCode('', 'pt', 'en')
    If I run this through the Run option on the IDE I get the following output:
    URL encoding doit=done&urltext=ent%C3%
    o
    o
    If I import this module on the interpreter and then call
    print translateByCode('', 'en', 'pt')
    I get:
    URL encoding doit=done&urltext=ent%
    then
    then
    Now the urllib encoding of the urltext IS different ("ent%C3%A3o" VS
    "ent%E3o") even though I'm passing the same stuff!
    And this works fine except when I use special characters and I don't
    know how to use the utf-8 encoding to get this working -i know altavista
    uses utf-8 because they also translate chinese.
    Thanks again and sorry for the blurb but i ran out of solutions for this
    one.
    Tutor maillist - Tutor (AT) python (DOT) org
  • No.1 | | 5542 bytes | |

    I got it working with a utf-8 query by adding an Accept-Charset header to the request. I used the 'Tamper Data' add-on to Firefox to view all the request headers being sent by the browser. I added all the same headers to the Python request and it worked. Then I took out the headers until I found the needed one. Here is a stripped-down version of your code that posts a word encoded in utf-8 and gets the correct response. I also changed the post parameters a little to match what I am seeing in my browser:

    import re, urllib, urllib2

    __where = [ re.compile(r'name=\"q\">([^<]*)'),
    re.compile(r'td bgcolor=white>([^<]*)'),
    re.compile(r'td bgcolor=white class=s><div style=padding:10px;>([^<]*)'),
    re.compile(r'<\/strong><br>([^<]*)')
    ]

    phrase = 'ent\xc3\xa3o'
    params = urllib.urlencode( { 'doit' : 'done',
    'tt' : 'urltext',
    'trtext' : phrase,
    'intl' : 1,
    'lp' : 'pt_en' } )
    print "URL encoding ", params

    req = urllib2.Request('')

    req.add_header('Accept-Charset', 'IS,utf-8;q=0.7,*;q=0.7')

    response = urllib2.urlopen(req, params)

    html = response.read()
    for regex in __where:
    match = regex.search(html)
    if match:
    print match.group(1)
    break
    else:
    print "ERRR MATCHING"
    print html

    Kent

    Kent Johnson wrote:
    K this is actually starting to make sense :-) Here is what I think is happening:

    You get different results in the IDE and the console because they are using different encodings. The IDE is using utf-8 so the params are encoded in utf-8. The console is using latin-1 and you get encoded latin-1 params.

    When you use babelfish from the browser it gets a page in utf-8 and sends the parameters back the same way, but probably with a header saying it is utf-8. When you use urllib you don't tell it the encoding so it is assuming latin-1, that's why the interpreter version works.

    So in your GUI version if you get utf-8 from the GUI, you can convert it to latin-1 by
    phrase.decode('utf-8').encode('latin-1') as long as your text can be expressed in latin-1. If you need utf-8 then you have to figure out how to tell babelfish that you are sending utf-8.

    Kent

    PS please reply to the list not to me personally.

    Jorge Louis de Castro wrote:

    >>Thanks again,
    >>
    >>I'm sorry to be such a PITB but this is driving me insane! the code
    >>below easily connects to babelfish and returns a translated string.
    >>

    where = [ re.compile(r'name=\"q\">([^<]*)'),
    >re.compile(r'td bgcolor=white>([^<]*)'),
    >re.compile(r'td bgcolor=white class=s><div
    >>style=padding:10px;>([^<]*)'),

    >re.compile(r'<\/strong><br>([^<]*)')
    >>
    >>def clean(text):

    >return ' '.join(string.replace(text.strip(), "\n", ' ').split())
    >>
    >>def translateByCode(phrase, from_code, to_code):

    >phrase = clean(phrase)
    >params = urllib.urlencode( { 'BabelFishFrontPage' : 'yes',
    >'doit' : 'done',
    >'urltext' : phrase,
    >'lp' : from_code + '_' + to_code } )
    >print "URL encoding ", params
    >try:
    >response =
    >>urllib.urlopen('', params)

    >except IError, what:
    >print "ERRRR TRANSLATING ", what
    >except:
    >print "Unexpected error:", sys.exc_info()[0]
    >>

    >html = response.read()
    >for regex in __where:
    >match = regex.search(html)
    >if match: break
    >if not match: print "ERRR MATCHING"
    >return clean(match.group(1))
    >>
    >>if __name__ == '__main__':

    >print translateByCode('', 'pt', 'en')
    >>
    >>If I run this through the Run option on the IDE I get the following output:
    >>
    >>URL encoding doit=done&urltext=ent%C3%
    >>o
    >>o
    >>
    >>If I import this module on the interpreter and then call
    >>
    >>print translateByCode('', 'en', 'pt')
    >>
    >>I get:
    >>
    >>URL encoding doit=done&urltext=ent%
    >>then
    >>then
    >>
    >>Now the urllib encoding of the urltext IS different ("ent%C3%A3o" VS
    >>"ent%E3o") even though I'm passing the same stuff!
    >>And this works fine except when I use special characters and I don't
    >>know how to use the utf-8 encoding to get this working -i know altavista
    >>uses utf-8 because they also translate chinese.
    >>
    >>Thanks again and sorry for the blurb but i ran out of solutions for this
    >>one.
    >>
    >>
    >>


    Tutor maillist - Tutor (AT) python (DOT) org

    Tutor maillist - Tutor (AT) python (DOT) org

Re: i18n on Entry widgets


max 4000 letters.
Your nickname that display:
In order to stop the spam: 6 + 5 =
QUESTION ON "Python"

EMSDN.COM