Databases

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Escaped backslash in SQL constant

    6 answers - 1359 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hello!
    The database cluster is initialized to use UNICDE. The client encoding
    is set to BIG5. The middleware escapes the backslash in the following
    string before writing to TEXT/VARCHAR column in server:
    a5 5c af e0
    This is a string comprises Big5 characters each of 2 octets big. Note
    that the second octets, 5c, of the first Big5 character clashes
    backslash '\'. Thus, this string is escaped to
    a5 5c 31 33 34 af e0
    and is inserted to backend something like this:
    INSERT INT x VALUES ('y\134na');
    Where octets 'y' and character "na" are unprintable characters here in
    terms of ASCII.
    Problem is that this string is stored exactly the same as the input: "a5
    5c 31 33 34 af e0" instead of "a5 5c af e0". The SELECT outputs the
    escaped string (7 octets) instead of the unescaped string (4 octets),
    too.
    However, the server manages the following string differently:
    INSERT INT x VALUES ('A\134B');
    and
    SELECT * FRM x;
    outputs
    A\B
    Its size stored in the column is 3 octets. This second case is exactly
    what I need. I guess strings like "C:\dir" is properly processed too
    though I did not test it.
    Why server treats the first string in this undesired way?
    Regards,
    CN
  • No.1 | | 947 bytes | |

    "CN" <cnliou9 (AT) fastmail (DOT) fmwrites:
    The database cluster is initialized to use UNICDE. The client encoding
    is set to BIG5. The middleware escapes the backslash in the following
    string before writing to TEXT/VARCHAR column in server:

    a5 5c af e0

    Seems to me that you need to fix your broken middleware it has no
    business doing that.

    This is a string comprises Big5 characters each of 2 octets big. Note
    that the second octets, 5c, of the first Big5 character clashes
    backslash '\'. Thus, this string is escaped to

    a5 5c 31 33 34 af e0

    Now you have a 2-octet Big5 character, followed by 3 1-octet ASCII
    digits, followed by a 2-octet Big5 character. All the subsequent
    processing is doing what it should with this, AFAICS.

    regards, tom lane

    (end of broadcast)
    TIP 4: Have you searched our list archives?

    http://archives.postgresql.org
  • No.2 | | 1765 bytes | |

    Many thanks for the lightening fast answer!

    The database cluster is initialized to use UNICDE. The client encoding
    is set to BIG5. The middleware escapes the backslash in the following
    string before writing to TEXT/VARCHAR column in server:

    a5 5c af e0

    Seems to me that you need to fix your broken middleware it has no
    business doing that.

    Are you suggesting that the middleware should not escape backslashes in
    the first place?

    The doc in
    #SQL-SYNTAX-CNSTANTS
    reads:

    "Any other character following a backslash is taken literally. Thus, to
    include a backslash in a string constant, write two backslashes."

    My understanding on this statement is that escaping \ to \\ blindly for
    any string and any encoding hurts nothing. Also, I thought that escaping
    "\" to "\134" for TEXT column should be always equivalent to escaping it
    to "\\", at least with this current (8.1) version.

    This is a string comprises Big5 characters each of 2 octets big. Note
    that the second octets, 5c, of the first Big5 character clashes
    backslash '\'. Thus, this string is escaped to

    a5 5c 31 33 34 af e0

    Now you have a 2-octet Big5 character, followed by 3 1-octet ASCII
    digits, followed by a 2-octet Big5 character. All the subsequent
    processing is doing what it should with this, AFAICS.

    What I still don't realize is that as mentioned in my first posting,
    "A\134B" gets stored in TEXT with value "A\B" but the Big5 string
    escaped in the same manner yeids different result - the latter string is
    stored in escaped form and becomes longer than intended. I thought two
    2-octet Big5 characters would be stored in backend.

    Regards,

    CN
  • No.3 | | 997 bytes | |

    "CN" <cnliou9 (AT) fastmail (DOT) fmwrites:
    >Seems to me that you need to fix your broken middleware it has no
    >business doing that.


    Are you suggesting that the middleware should not escape backslashes in
    the first place?

    No, I'm suggesting that it shouldn't be let loose on Big5 data when it
    evidently hasn't a clue about that encoding. The byte in question
    *is not* a backslash, it's not even an independent character; and so
    changing it on the assumption that it is logically a backslash simply
    breaks the data.

    Your quickest route to a solution may be to avoid Big5 in favor of
    an encoding that is ASCII-safe, such as UTF8. You can feed that through
    code that only understands ASCII with much less risk than an encoding
    where second and later bytes might look like ASCII.

    regards, tom lane

    (end of broadcast)
    TIP 3: Have you checked our extensive FAQ?

  • No.4 | | 1270 bytes | |

    No, I'm suggesting that it shouldn't be let loose on Big5 data when it
    evidently hasn't a clue about that encoding. The byte in question
    *is not* a backslash, it's not even an independent character; and so
    changing it on the assumption that it is logically a backslash simply
    breaks the data.

    Would you please enlighten me the behavior of the backend - why

    SET CLIENT_ENCDING T Big5;
    INSERT INT y VALUES ('A\134B');

    stores

    A\B

    while

    INSERT INT y VALUES ('y\134na');
    --"y\" and "na" are two Big5 characters.

    stores

    y\134na

    instead of

    y\na

    Your quickest route to a solution may be to avoid Big5 in favor of
    an encoding that is ASCII-safe, such as UTF8. You can feed that through
    code that only understands ASCII with much less risk than an encoding
    where second and later bytes might look like ASCII.

    Are you suggesting me to implement the middleware that will translate
    Big5 input to UTF8 and then escape the latter before sending it to
    PostgreSQL?

    SET CLIENT_ENCDING T UTF8;
    [BIG5 string from user] [middleware] [UTF8] [escaped UTF8]
    PostgreSQL (initdb with -E UNICDE)

    Best regards,

    CN
  • No.5 | | 901 bytes | |

    "CN" <cnliou9 (AT) fastmail (DOT) fmwrites:
    INSERT INT y VALUES ('y\134na');
    --"y\" and "na" are two Big5 characters.

    I'm not sure how to explain it any more clearly: the backslash in this
    example is not a backslash. It's a byte within a multibyte character,
    which *entirely coincidentally* happens to have the same numeric value
    as an ASCII backslash. But it isn't a backslash. And it won't be
    processed as a backslash by any Big5-aware code.

    Code that does not understand about multibyte characters is simply
    unsafe to apply to data that is in Big5. You need to fix that
    middleware to understand Big5 encoding; or if that seems impractical,
    switch to using another encoding for the data the middleware sees.

    regards, tom lane

    (end of broadcast)
    TIP 2: Don't 'kill -9' the postmaster
  • No.6 | | 1039 bytes | |

    Thank you again for the clarification!

    I'm not sure how to explain it any more clearly: the backslash in this
    example is not a backslash. It's a byte within a multibyte character,
    which *entirely coincidentally* happens to have the same numeric value
    as an ASCII backslash. But it isn't a backslash. And it won't be
    processed as a backslash by any Big5-aware code.

    If I understand this explanation correctly, then you have given me the
    answer I needed!
    Please correct me if my understanding that follows is wrong again:

    PostgreSQL is Big5-aware code. It does not blindly unescape every
    backlash it encounters in SQL literals. Instead, it sees backslash as
    part of some Big5 characters as they are supposed to be when client
    encoding is set to Big5 (SET CLIENT_ENCDING T BIG5). In other words,
    PostgreSQL sees the backslash in "C:\134" differently from that being
    part of multi-byte characters depending on which client encoding is
    used.

    Regards,

    CN

Re: Escaped backslash in SQL constant


max 4000 letters.
Your nickname that display:
In order to stop the spam: 3 + 2 =
QUESTION ON "Databases"

EMSDN.COM