Python

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • ansi vs. unicode

    7 answers - 1125 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Just a quick question about these two versions of wxPython. If I don't
    purposely take advantage of the Unicode features of the Unicode build of
    wxPython, does that mean that my program will run the same with either
    version?
    I guess another way to ask it is, if I build a program with the ANSI
    version of wxPython, does that hinder it in any way even if I'm not
    trying to do anything special with Unicode characters?
    In other words, to put it yet another way, is Unicode something you must
    use explicitly in order to get any use out of the Unicode build,
    otherwise they are the same?
    I'm just trying to figure out which version I really need. I figure
    since I even have to ask the question, I could probably settle for ANSI,
    but I want to make sure that this doesn't handcuff me later, such as
    when I might want to switch to the Unicode version.
    Thanks,
    John
    To unsubscribe, e-mail: wxPython-users-unsubscribe (AT) lists (DOT) wxwidgets.org
    For additional commands, e-mail: wxPython-users-help (AT) lists (DOT) wxwidgets.org
  • No.1 | | 3193 bytes | |

    John Salerno <johnjsal (AT) NSPAMgmail (DOT) comwrote:
    Just a quick question about these two versions of wxPython. If I don't
    purposely take advantage of the Unicode features of the Unicode build of
    wxPython, does that mean that my program will run the same with either
    version?

    Not necessarily. It doesn't take too much work to inadvertantly paste a
    unicode character into some control from some other unicode-enabled
    application (like your web browser, email client, etc.). a unicode
    build, that character will look as you expect it to, and will work just
    fine. That is, until you try to save the content of that control to
    disk. Then you get to have fun with the wonderful world of encodings.

    Using an ANSI build, you'll probably either not be able to paste that
    bit of unicode text, or when you do, any non-ascii characters will
    probably be displayed as a garbage character of some kind (I've noticed
    boxes generally). the upside, you also won't need to bother with
    encodings when trying to save data.

    Either way, you may need to deal with encodings when *reading* data from
    disk, if that data can come from disparate sources with/without
    encodings, etc.

    I guess another way to ask it is, if I build a program with the ANSI
    version of wxPython, does that hinder it in any way even if I'm not
    trying to do anything special with Unicode characters?

    It depends.

    In other words, to put it yet another way, is Unicode something you must
    use explicitly in order to get any use out of the Unicode build,
    otherwise they are the same?

    If your software is open source, there are good odds that some
    non-english-speaking user is going to pick it up and try to use it.
    They will be putting in characters from their language, and if/when it
    doesn't work, you will get a "please add unicode support" request.

    I'm just trying to figure out which version I really need. I figure
    since I even have to ask the question, I could probably settle for ANSI,
    but I want to make sure that this doesn't handcuff me later, such as
    when I might want to switch to the Unicode version.

    As long as you are explicitly handling saving/loading in an
    encoding-aware way, ANSI may be sufficient (don't load non-ascii files).
    Earlier versions of PyPE didn't support unicode, saving, or loading with
    a particular encoding. I eventually got a feature request, and added
    the necessary support that is only run in Unicode builds. The current
    version includes detection of encoding for Python coding: directives,
    XML encoding declarations, and BMs. If you download the source version,
    it can be seen in pype.py:PythonSTC.SetText and GetText .

    Depending on what you plan to do with the content, and/or/if you plan on
    having any sort of persistance, you may need to deal with unicode and
    encodings.

    - Josiah

    To unsubscribe, e-mail: wxPython-users-unsubscribe (AT) lists (DOT) wxwidgets.org
    For additional commands, e-mail: wxPython-users-help (AT) lists (DOT) wxwidgets.org
  • No.2 | | 655 bytes | |

    Josiah Carlson wrote:

    Depending on what you plan to do with the content, and/or/if you plan on
    having any sort of persistance, you may need to deal with unicode and
    encodings.

    Thanks very much. more question: is it ok to use the Unicode
    version, even if I don't deal with Unicode (just to be "safe")? Does the
    Unicode build cause any extra overhead or anything else that ANSI
    doesn't do, even if I don't use Unicode with it?

    To unsubscribe, e-mail: wxPython-users-unsubscribe (AT) lists (DOT) wxwidgets.org
    For additional commands, e-mail: wxPython-users-help (AT) lists (DOT) wxwidgets.org
  • No.3 | | 1324 bytes | |

    John Salerno <johnjsal (AT) NSPAMgmail (DOT) comwrote:

    Josiah Carlson wrote:

    Depending on what you plan to do with the content, and/or/if you plan on
    having any sort of persistance, you may need to deal with unicode and
    encodings.

    Thanks very much. more question: is it ok to use the Unicode
    version, even if I don't deal with Unicode (just to be "safe")? Does the
    Unicode build cause any extra overhead or anything else that ANSI
    doesn't do, even if I don't use Unicode with it?

    Generally a slight memory increase during runtime, if because the
    unicode dll is slightly larger, and because every native control and
    unicode string will generally be representing every character internally
    as 2 bytes rather than 1.

    If you know what you are getting yourself into, I would suggest just
    using the unicode version and figuring out what kind of persistance you
    are going to need between runs (preferences, etc.), and making sure that
    it is at least unicode agnostic (for preference saving, the miniconf
    module works reasonably well:
    )

    - Josiah

    To unsubscribe, e-mail: wxPython-users-unsubscribe (AT) lists (DOT) wxwidgets.org
    For additional commands, e-mail: wxPython-users-help (AT) lists (DOT) wxwidgets.org
  • No.4 | | 1998 bytes | |

    and because every native control and unicode string will generally be
    representing every character internally
    as 2 bytes rather than 1.

    This depends on if you choose utf-8 or utf-16. utf-8 only uses 1 byte to
    encode ascii chars where as utf-16 uses two bytes. The pros for utf-16 is
    that it uses less bytes for CJK languages, the pros for utf-8 is that you
    can convert ascii directly (the same hex value) and that it uses only 1 byte
    for ascii but for CJK it will use up to four bytes per character.

    Rune,
    reporting from the wonderful world of encoding.

    9/19/06, Josiah Carlson <jcarlson (AT) uci (DOT) eduwrote:

    John Salerno <johnjsal (AT) NSPAMgmail (DOT) comwrote:

    Josiah Carlson wrote:

    Depending on what you plan to do with the content, and/or/if you plan
    on
    having any sort of persistance, you may need to deal with unicode and
    encodings.

    Thanks very much. more question: is it ok to use the Unicode
    version, even if I don't deal with Unicode (just to be "safe")? Does the
    Unicode build cause any extra overhead or anything else that ANSI
    doesn't do, even if I don't use Unicode with it?

    Generally a slight memory increase during runtime, if because the
    unicode dll is slightly larger, and because every native control and
    unicode string will generally be representing every character internally
    as 2 bytes rather than 1.

    If you know what you are getting yourself into, I would suggest just
    using the unicode version and figuring out what kind of persistance you
    are going to need between runs (preferences, etc.), and making sure that
    it is at least unicode agnostic (for preference saving, the miniconf
    module works reasonably well:
    )

    - Josiah
    --

    To unsubscribe, e-mail: wxPython-users-unsubscribe (AT) lists (DOT) wxwidgets.org
    For additional commands, e-mail: wxPython-users-help (AT) lists (DOT) wxwidgets.org
  • No.5 | | 1348 bytes | |

    "Rune Devik" <rune.devik (AT) gmail (DOT) comwrote:
    and because every native control and unicode string will generally be
    representing every character internally
    as 2 bytes rather than 1.

    This depends on if you choose utf-8 or utf-16. utf-8 only uses 1 byte to
    encode ascii chars where as utf-16 uses two bytes. The pros for utf-16 is
    that it uses less bytes for CJK languages, the pros for utf-8 is that you
    can convert ascii directly (the same hex value) and that it uses only 1 byte
    for ascii but for CJK it will use up to four bytes per character.

    Note that I said "native control" and "unicode string". Not "encoded
    unicode string". Unless one goes to extraordinary measures, Python is
    compiled with a 2-byte per code point representation (UCS-2), Windows
    uses 2-byte unicode characters (aslo UCS-2), and the underlying native
    controls (in Windows and I believe wxGTK) also use 2-byte characters (in
    UCS-2).

    For writing to disk, you can certainly use utf-8 as an encoding to get
    1-byte characters for many European code points, but that wasn't what I
    was pointing out.

    - Josiah

    To unsubscribe, e-mail: wxPython-users-unsubscribe (AT) lists (DOT) wxwidgets.org
    For additional commands, e-mail: wxPython-users-help (AT) lists (DOT) wxwidgets.org
  • No.6 | | 1286 bytes | |

    Yup, that is true :)
    - Rune
    9/19/06, Josiah Carlson <jcarlson (AT) uci (DOT) eduwrote:
    --
    "Rune Devik" <rune.devik (AT) gmail (DOT) comwrote:
    and because every native control and unicode string will generally be
    representing every character internally
    as 2 bytes rather than 1.

    This depends on if you choose utf-8 or utf-16. utf-8 only uses 1 byte to
    encode ascii chars where as utf-16 uses two bytes. The pros for utf-16
    is
    that it uses less bytes for CJK languages, the pros for utf-8 is that
    you
    can convert ascii directly (the same hex value) and that it uses only 1
    byte
    for ascii but for CJK it will use up to four bytes per character.

    Note that I said "native control" and "unicode string". Not "encoded
    unicode string". Unless one goes to extraordinary measures, Python is
    compiled with a 2-byte per code point representation (UCS-2), Windows
    uses 2-byte unicode characters (aslo UCS-2), and the underlying native
    controls (in Windows and I believe wxGTK) also use 2-byte characters (in
    UCS-2).

    For writing to disk, you can certainly use utf-8 as an encoding to get
    1-byte characters for many European code points, but that wasn't what I
    was pointing out.

    - Josiah
  • No.7 | | 1497 bytes | |

    Josiah Carlson wrote:

    Unless one goes to extraordinary measures, Python is
    compiled with a 2-byte per code point representation (UCS-2), Windows
    uses 2-byte unicode characters (aslo UCS-2),

    It depends. Python can be built such that a Unicode character is either
    2 bytes or 4 bytes. Most Pythons distributed with *nix distros will use
    the 4-byte option, although if you build Python yourself you will end up
    with the 2-byte option by default. Windows and SX builds use the
    2-byte option. You can tell what you have by looking at the
    sys.maxunicode value. If it is 65535 then your unicode chars are 2
    bytes each. If it's something like 1114111 then they are 4 bytes.

    and the underlying native
    controls (in Windows and I believe wxGTK) also use 2-byte characters (in
    UCS-2).

    GTK uses utf-8 for everything.

    In a Unicode build of wxWidgets/wxPython the wxString class will hold
    whatever the compiler's wchar_t type evaluates to. This can vary from
    platform to platform, and even from compiler to compiler. In practice
    though that's not a big deal for wxPython because there are functions in
    the Python C API that convert to/from wchar_t and whatever type Python
    is using for a Unicode char type, and if they happen to be the same then
    the functions are essentially a nop and have little overhead. So I use
    those functions when converting to/from wxString and Python Unicode
    objects and all is well.

Re: ansi vs. unicode


max 4000 letters.
Your nickname that display:
In order to stop the spam: 0 + 0 =
QUESTION ON "Python"

EMSDN.COM