C/C++

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • The C execution character set

    0 answers - 2045 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    PGP SIGNED MESSAGE
    Hash: SHA1
    This question pertains to both Standard C and platform-dependent
    features, hence the crosspost.
    I'm trying to understand exactly how the "execution character set"
    works. GNU/Linux, using GCC >= 3.4, if I compile a C source file
    (any encoding), by default the execution character set is UTF-8, and
    the wide execution character set is UTF-32.
    What I want to understand is what the implications of this are on the
    various operations I might want to perform on any of the strings. As
    an example:
    #include <locale.h>
    #include <stdio.h>
    #include <wchar.h>
    int
    main (void)
    {
    setlocale (LC_ALL, "");
    printf("N\n");
    printf("%ls\n", L"N");
    fwide(stderr, 1);
    fwprintf(stderr, L"N\n");
    fwprintf(stderr, L"%s\n", "N");
    printf("N\n");
    return 0;
    }
    If I run in a normal (UTF-8) locale:
    $ ./test
    N
    N
    N
    N
    N
    Now, running in a C locale:
    $ ./test
    'Name3'
    N
    N
    "N" and "N" are the same. These passed through
    byte-for-byte identical. No conversions took place, I think.
    "N" (narrow) was lost. Why?
    "N" (wide) was lost. Why?
    "N" (wide) was *not* lost. Moreover, it was
    transliterated (UTF-32->US-ASCII) into a readable form for the locale.
    Where does the conversion take place, and how does the C runtime know
    what the source and destination charset are? I can't replicate the
    conversion with iconv(), so I'd like to know how to do it by hand.
    I'd like to understand the reasons for why each of these cases work
    the way they do.
    Thanks,
    Roger
    - --
    Roger Leigh
    Printing on GNU/Linux?
    Debian GNU/Linux http://www.debian.org/
    GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
    PGP SIGNATURE
    Version: GnuPG v1.4.1 (GNU/Linux)
    Comment: Processed by Mailcrypt 3.5.8 <>
    KLhWn0VmNzDLFD8gPHBFpgU=
    =rX99
    PGP SIGNATURE

Re: The C execution character set


max 4000 letters.
Your nickname that display:
In order to stop the spam: 5 + 4 =
QUESTION ON "C/C++"

EMSDN.COM