Perl

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • possible memory related bug when using sprintf with an utf-8 encoded format-string and iso

    7 answers - 5590 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    # New Ticket Created by willem (AT) lunatech (DOT) com
    # Please include the string: [perl #39126]
    # in the subject line of all future correspondence about this issue.
    # <URL: >
    This is a bug report for perl from willem (AT) lunatech (DOT) com,
    generated with the help of perlbug 1.35 running under perl v5.8.8.
    Dear maintainers,
    For a project I'm working on, I've run into a difficult to understand issue.
    The code I'm working on translates edi messages that may have various encodings
    In some cases we run into a perl crash when formatting a translated string, which in
    the general case works normally. The error messages returned is:
    glibc detected realloc(): invalid next size: 0x081fac98
    Aborted
    I've tried to pin down the root cause of the problem and managed to write two
    simple scripts which only a slight variation. of them crashes as above, while
    the other runs normally.
    Some extra testing with valgrind of the crashing test script shows the following:
    Invalid write of size 1
    at 0x80D33E2: Perl_sv_vcatpvfn (in /usr/bin/perl)
    by 0x8107EB2: Perl_do_sprintf (in /usr/bin/perl)
    Address 0x651FBB4 is 0 bytes after a block of size 52 alloc'd
    Which in my eyes looks like a buffer overrun.
    I'm not sure if I can attach files with 'perlbug' and I certainly do not know how, so
    you'll find the two mentioned test scripts below. The first one crashes, the secod one not.
    The first test script is:
    sprintf-bug.pl
    use strict;
    use warnings;
    use Encode;
    my $format = decode("utf-8", encode('utf-8', "%5s%-10s%-35s%-35s"));
    my @records = ('', '', "\344\345", "\326");
    my $line = sprintf($format, @records);
    print STDUT "$line\n";
    /sprintf-bug.pl
    And the second test script is:
    sprintf-nonbug.pl
    use strict;
    use warnings;
    use Encode;
    my $format = decode("utf-8", encode('utf-8', "%5s%-35s%-35s"));
    my @records = ('', "\344\345", "\326");
    my $line = sprintf($format, @records);
    print STDUT "$line\n";
    /sprintf-nonbug.pl
    I hope this provides you with enough information to identify the actual
    bug and write a fix for it. I know that the code may seem funny by having
    the format string in utf-8, but I don't think that it should result in
    a crash.
    In any event, thanks for your time to look into it. If you need any assistance,
    then please, just let me know.
    Kindest regards,
    Willem-Jan Veen
    The Netherlands
    Flags:
    category=core
    severity=high
    Site configuration information for perl v5.8.8:
    Configured by Debian Project at Tue Apr 4 22:34:25 UTC 2006.
    Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
    Platform:
    osname=linux, osvers=2.6.15.4,
    uname='linux ninsei 2.6.15.4 #1 smp preempt mon feb 20 09:48:53 pst 2006 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3perl -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SURCE -D_FILEFFSET_BITS=64',
    optimize='',
    cppflags='-D_REENTRANT -D_GNU_SURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion='', gccversion='4.0.3 (Debian 4.0.3-1)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, ='off_t', lseeksize=8
    alignbytes=4, prototype=define
    Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.6.so, so=so, useshrplib=true, libperl=libperl.so.5.8.8
    gnulibc_version='2.3.6'
    Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
    Locally applied patches:
    @INC for perl v5.8.8:
    /etc/perl
    /usr/local/lib/perl/5.8.8
    /usr/local/share/perl/5.8.8
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8
    /usr/share/perl/5.8
    /usr/local/lib/site_perl
    .
    Environment for perl v5.8.8:
    HME=/home/willem
    LANG (unset)
    LANGUAGE (unset)
    LC_CTYPE=en_US
    LD_LIBRARY_PATH (unset)
    LGDIR (unset)
    PERL_BADLANG (unset)
    SHELL=/bin/bash
  • No.1 | | 4004 bytes | |

    20060511, at 11:58, willem (AT) lunatech (DOT) com (via RT) wrote:

    you'll find the two mentioned test scripts below. The first one
    crashes, the secod one not.

    Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
    Platform:
    osname=linux, osvers=2.6.15.4,
    multi

    FWIW, I can't make get either of these to crash _under normal
    circumstances_ on Mac S X with any of these perls:

    This is perl, v5.9.4 DEVEL28156 built for darwin-thread-multi-2level

    This is perl, v5.8.8 built for darwin-2level (Sorry, don't have a
    threaded build to hand)

    This is perl, v5.8.6 built for darwin-thread-multi-2level

    However, if I use libgmalloc.dylib, "an aggressive debugging malloc
    library", I do get a crash for the first script. Here's the trace:

    Host Name: Tullamore
    Date/Time: 2006-05-11 20:17:43.941 +0200
    S Version: 10.4.6 (Build 8I127)
    Report Version: 4

    Command: perl
    Path: ./perl
    Parent: bash [290]

    Version: ? (?)

    PID: 4525
    Thread: 0

    Exception: EXC_BAD_ACCESS (0x0001)
    Codes: KERN_PRTECTIN_FAILURE (0x0002) at 0xb954b000

    Thread 0 Crashed:
    0 perl 0x000f0298 Perl_sv_vcatpvfn + 13036 (sv.c:9415)
    1 perl 0x000ecd98 Perl_sv_vsetpvfn + 108 (sv.c:8291)
    2 perl 0x0003e58c Perl_do_sprintf + 344 (doop.c:719)
    3 perl 0x00070224 Perl_pp_sprintf + 96 (pp.c:3305)
    4 perl 0x0013df04 Perl_runops_debug + 332 (dump.c:1734)
    5 perl 0x000289dc S_run_body + 524 (perl.c:2396)
    6 perl 0x000284ac perl_run + 192 (perl.c:2318)
    7 perl 0x00002bdc main + 232 (perlmain.c:105)
    8 perl 0x0000234c _start + 340 (crt.c:272)
    9 perl 0x000021f4 start + 60

    Thread 0 crashed with PPC Thread State 64:
    srr0: 0x00000000000f0298 srr1:
    0x100000000200d030 vrsave: 0x0000000000000000
    cr: 0x48000404 xer: 0x0000000000000000 lr:
    0x00000000000f0230 ctr: 0x0000000000000000
    r0: 0x0000000000000000 r1: 0x00000000bffff270 r2:
    0x0000000000000001 r3: 0x00000000b954afdf
    r4: 0x0000000000000021 r5: 0xffffffff46ab5021 r6:
    0x00000000c3a4c3a5 r7: 0x0000000020202020
    r8: 0x00000000b954afff r9: 0x00000000b954afcc r10:
    0x00000000aa5660d4 r11: 0x000000000019384c
    r12: 0x0000000090129ea0 r13: 0x000000000016d984 r14:
    0x0000000000000001 r15: 0x00000000b40291e0
    r16: 0x0000000000000000 r17: 0x0000000000000000 r18:
    0x0000000000000000 r19: 0x00000000b9530ffe
    r20: 0x0000000000000000 r21: 0x0000000000000000 r22:
    0x0000000000000003 r23: 0x0000000000000000
    r24: 0x00000000b0002390 r25: 0x00000000b4ad7050 r26:
    0x00000000b9548ff8 r27: 0x0000000000000000
    r28: 0x0000000000000021 r29: 0x00000000b954b000 r30:
    0x00000000b954afdb r31: 0x00000000000ecfc8

    Binary Images Description:
    0x1000 - 0x191fff perl /
    bit_perl-current/perl
    0x8fe00000 - 0x8fe51fff dyld 44.4/usr/lib/dyld
    0x90000000 - 0x901bbfff libSystem.B.dylib /usr/lib/libSystem.B.dylib
    0x90213000 - 0x90218fff libmathCommon.A.dylib /usr/lib/system/
    libmathCommon.A.dylib
    0x9141a000 - 0x91425fff libgcc_s.1.dylib /usr/lib/libgcc_s.1.dylib
    0x9605e000 - 0x9607efff libmx.A.dylib /usr/lib/libmx.A.dylib
    0x9a564000 - 0x9a566fff libgmalloc.dylib /usr/lib/libgmalloc.dylib

    That's for bleadperl; others are similar.

    (ibgmalloc.dylib assigns each malloc() a new memory page with a guard
    page beyond it and the last byte of the allocation on the last byte
    of the page (modulo alignment considerations). The corresponding free
    () unmaps the page. Thus monkey business results in a crash.)

    The guilty code line is

    sv.c:9415 *p = '\0';

    but I'm afraid I can't see why p (which seems to correspond to r29)
    might be pointing to unallocated memory (not that I've had more than
    a cursory look).

    LC_CTYPE=en_US

    Putting LC_CTYPE=en_US in the environment (which contains no other
    locale-related variables) makes no difference to the symptoms.
  • No.2 | | 2698 bytes | |

    Thu, 11 May 2006 02:58:48 -0700, "willem (AT) lunatech (DOT) com (via RT)" <perlbug-followup (AT) perl (DOT) orgwrote

    # New Ticket Created by willem (AT) lunatech (DOT) com
    # Please include the string: [perl #39126]
    # in the subject line of all future correspondence about this issue.
    # <URL: >

    Dear maintainers,

    Some extra testing with valgrind of the crashing test script shows the following:
    Invalid write of size 1
    at 0x80D33E2: Perl_sv_vcatpvfn (in /usr/bin/perl)
    by 0x8107EB2: Perl_do_sprintf (in /usr/bin/perl)
    Address 0x651FBB4 is 0 bytes after a block of size 52 alloc'd

    Which in my eyes looks like a buffer overrun.

    I hope this provides you with enough information to identify the actual
    bug and write a fix for it. I know that the code may seem funny by having
    the format string in utf-8, but I don't think that it should result in
    a crash.

    I think the following is more reproducible in perl-current:

    for my $i (100) {
    my $format = "%-". ($i * 7). "s";
    utf8::upgrade($format);
    my @records = ("\344\345" x $i);
    my $line = sprintf($format, @records);
    }

    The problem occurs near the end of Perl_sv_vcatpvfn:

    STRLEN width = 7*$i
    STRLEN have = 2*$i
    STRLEN need = 7*$i
    STRLEN gap = 5*$i = need - have
    STRLEN elen = 4*$i (after sv_utf8_upgrade)
    SvCUR(sv) at last = gap + elen = need + (elen - have) = 9*$i
    SvLEN(sv) at last = PERL_STRLEN_RUNDUP(need + dotstrlen + 1)
    = PERL_STRLEN_RUNDUP(7*$i+2)

    ! elen gets doubled by sv_utf8_upgrade()
    but need on SvGRW(sv, SvCUR(sv) + need + dotstrlen + 1) doesn't care it!

    #sv.c 9341-9367
    /* calculate width before utf8_upgrade changes it */
    have = esignlen + zeros + elen;
    if (have < zeros)
    Perl_croak_nocontext(PL_memory_wrap);

    if (is_utf8 != has_utf8) {
    if (is_utf8) {
    if (SvCUR(sv))
    sv_utf8_upgrade(sv);
    }
    else {
    SV * const nsv = sv_2mortal(newSVpvn(eptr, elen));
    sv_utf8_upgrade(nsv);
    eptr = SvPVX_const(nsv);
    elen = SvCUR(nsv);
    }
    SvGRW(sv, SvCUR(sv) + elen + 1);
    p = SvEND(sv);
    *p = '\0';
    }

    need = (have width ? have : width);
    gap = need - have;

    if (need >= (((STRLEN)~0) - SvCUR(sv) - dotstrlen - 1))
    Perl_croak_nocontext(PL_memory_wrap);
    SvGRW(sv, SvCUR(sv) + need + dotstrlen + 1);

    SvGRW(sv, SvCUR(sv) + elen + 1) doesn't help it
    since SvGRW(sv, SvCUR(sv) + need + dotstrlen + 1) makes too short.
    To fix this, perhaps what are in bytes and what are in characters
    should be rethought.

    Regards,
    SADAHIR Tomoyuki
  • No.3 | | 3144 bytes | |

    Fri, 19 May 2006 01:29:36 +0900, SADAHIR Tomoyuki <bqw10602 (AT) nifty (DOT) comwrote

    Thu, 11 May 2006 02:58:48 -0700, "willem (AT) lunatech (DOT) com (via RT)" <perlbug-followup (AT) perl (DOT) orgwrote

    Some extra testing with valgrind of the crashing test script shows the following:
    Invalid write of size 1
    at 0x80D33E2: Perl_sv_vcatpvfn (in /usr/bin/perl)
    by 0x8107EB2: Perl_do_sprintf (in /usr/bin/perl)
    Address 0x651FBB4 is 0 bytes after a block of size 52 alloc'd

    Which in my eyes looks like a buffer overrun.

    The problem occurs near the end of Perl_sv_vcatpvfn:

    Here is a patch; the test suite in t/op/sprintf2.t used \xe4 but
    in EBCDIC this byte is U then must have no change on utf8_upgraded.
    \xb4 is upgraded into two octets even ifdef EBCDIC, hence it's
    appropriate for the test.

    Regards,
    SADAHIR Tomoyuki

    diff -urN perl-current@28232/sv.c perl/sv.c
    perl-current@28232/sv.cThu May 18 05:54:33 2006
    perl/sv.cSun May 21 18:24:45 2006
    @@ -9338,26 +9338,28 @@
    continue;/* not "break" */
    }
    -/* calculate width before utf8_upgrade changes it */
    +if (is_utf8 != has_utf8) {
    + if (is_utf8) {
    +if (SvCUR(sv))
    + sv_utf8_upgrade(sv);
    + }
    + else {
    +const STRLEN old_elen = elen;
    +SV * const nsv = sv_2mortal(newSVpvn(eptr, elen));
    +sv_utf8_upgrade(nsv);
    +eptr = SvPVX_const(nsv);
    +elen = SvCUR(nsv);
    +
    +if (width) { /* fudge width (can't fudge elen) */
    + width += elen - old_elen;
    +}
    +is_utf8 = TRUE;
    + }
    +}
    +
    have = esignlen + zeros + elen;
    if (have < zeros)
    Perl_croak_nocontext(PL_memory_wrap);
    -
    -if (is_utf8 != has_utf8) {
    - if (is_utf8) {
    - if (SvCUR(sv))
    - sv_utf8_upgrade(sv);
    - }
    - else {
    - SV * const nsv = sv_2mortal(newSVpvn(eptr, elen));
    - sv_utf8_upgrade(nsv);
    - eptr = SvPVX_const(nsv);
    - elen = SvCUR(nsv);
    - }
    - SvGRW(sv, SvCUR(sv) + elen + 1);
    - p = SvEND(sv);
    - *p = '\0';
    -}

    need = (have width ? have : width);
    gap = need - have;
    diff -urN perl-current@28232/t/op/sprintf2.t perl/t/op/sprintf2.t
    perl-current@28232/t/op/sprintf2.tTue Dec 13 22:58:10 2005
    perl/t/op/sprintf2.tSun May 21 18:34:34 2006
    @@ -6,7 +6,7 @@
    require './test.pl';
    }
    -plan tests =275;
    +plan tests =280;

    is(
    sprintf("%.40g ",0.01),
    @@ -18,13 +18,14 @@
    sprintf("%.40f", 0.01)." ",
    q(the sprintf "%.<number>f" optimization)
    );
    -{
    -chop(my $utf8_format = "%-3s\x{100}");
    -is(
    -sprintf($utf8_format, "\xe4"),
    -"\xe4 ",
    -q(width calculation under utf8 upgrade)
    -);
    +
    +# cases of $i 1 are against [perl #39126]
    +for my $i (1, 5, 10, 20, 50, 100) {
    + chop(my $utf8_format = "%-". 3*$i ."s\x{100}");
    + my $string = "\xB4"x$i; # latin1 ACUTE or ebcdic CPYRIGHT
    + my $expect = $string." "x$i; # followed by 2*$i spaces
    + is(sprintf($utf8_format, $string), $expect,
    + "width calculation under utf8 upgrade, length=$i");
    }

    # Used to mangle PL_sv_undef
    End of patch
  • No.4 | | 800 bytes | |

    Here is a tweak of the test suite.
    The width can be replaced with an asterisk.

    diff -urN perl-current@28232/t/op/sprintf2.t perl/t/op/sprintf2.t
    perl-current@28232/t/op/sprintf2.tTue Dec 13 22:58:10 2005
    perl/t/op/sprintf2.tSun May 21 18:34:34 2006

    +# cases of $i 1 are against [perl #39126]
    +for my $i (1, 5, 10, 20, 50, 100) {
    + chop(my $utf8_format = "%-". 3*$i ."s\x{100}");

    + chop(my $utf8_format = "%-*s\x{100}");

    + my $string = "\xB4"x$i; # latin1 ACUTE or ebcdic CPYRIGHT
    + my $expect = $string." "x$i; # followed by 2*$i spaces
    + is(sprintf($utf8_format, $string), $expect,

    + is(sprintf($utf8_format, 3*$i, $string), $expect,

    + "width calculation under utf8 upgrade, length=$i");

    Regards,
    SADAHIR Tomoyuki
  • No.5 | | 1209 bytes | |

    # New Ticket Created by willem (AT) lunatech (DOT) com
    # Please include the string: [perl #39126]
    # in the subject line of all future correspondence about this issue.
    # <URL: >

    Some extra testing with valgrind of the crashing test script shows the following:
    Invalid write of size 1
    at 0x80D33E2: Perl_sv_vcatpvfn (in /usr/bin/perl)
    by 0x8107EB2: Perl_do_sprintf (in /usr/bin/perl)
    Address 0x651FBB4 is 0 bytes after a block of size 52 alloc'd

    The problem occurs near the end of Perl_sv_vcatpvfn:

    To fix this, perhaps what are in bytes and what are in characters
    should be rethought.

    Currently the width and the precision for %s are in characters;
    for example:
    sprintf("%03s", "\x{100}") returns "00\x{100}".
    sprintf("%.3s", "\x{abcd}12345") returns "\x{abcd}12".

    But the width and the precision for %c and in bytes;
    for example:
    sprintf("%03c", 0x100) returns "0\x{100}".
    sprintf("%.2c", 0xabcd) returns "\352\257" (malformed UTF8).

    Then %n sets the number of characters output in bytes;
    for example:
    after sprintf("%s%n", "\x{beef}", $a), $a is set to 3.

    Regards,
    SADAHIR Tomoyuki
  • No.6 | | 347 bytes | |

    SADAHIR Tomoyuki wrote:
    Here is a patch; the test suite in t/op/sprintf2.t used \xe4 but
    in EBCDIC this byte is U then must have no change on utf8_upgraded.
    \xb4 is upgraded into two octets even ifdef EBCDIC, hence it's
    appropriate for the test.

    Thanks, applied as change #28328 (with further test suite tweaks).
  • No.7 | | 1652 bytes | |

    Sun, 28 May 2006 20:42:25 +0900, SADAHIR Tomoyuki <bqw10602 (AT) nifty (DOT) comwrote

    Currently the width and the precision for %s are in characters;
    for example:
    sprintf("%03s", "\x{100}") returns "00\x{100}".
    sprintf("%.3s", "\x{abcd}12345") returns "\x{abcd}12".

    But the width and the precision for %c and in bytes;
    for example:
    sprintf("%03c", 0x100) returns "0\x{100}".
    sprintf("%.2c", 0xabcd) returns "\352\257" (malformed UTF8).

    Then %n sets the number of characters output in bytes;
    for example:
    after sprintf("%s%n", "\x{beef}", $a), $a is set to 3.

    In my opinion these numbers for Unicode should be regarded as those
    in characters.

    First, as well as printf() can take I layers, sprintf() can set
    the flag SVf_UTF8 of the return value. Hence the format string for
    them should cope with the character semantics.

    Second, a character in the right-hand part of latin1, say U+00DF, is
    represented by a two-byte sequence in UTF-8.
    Currently printf("%s%n", pack('U', 0xDF), $a) sets $a to 2 though
    the output is actually a single byte, since doio.c#Perl_do_print
    converts it from utf8 to bytes (from UTF-8 to Latin1 ifndef EBCDIC).
    To me, this result is inconsistent.

    Third, the output encoding can be freely changed through I layers.
    The number of bytes mapped to a Unicode character varies depending
    on the encoding, while the number of characters mapped to a Unicode
    character is almost always 1 in most encodings.
    If the numbers are in characters, the results coincide better.

    Regards,
    SADAHIR Tomoyuki

Re: possible memory related bug when using sprintf with an utf-8 encoded format-string and iso


max 4000 letters.
Your nickname that display:
In order to stop the spam: 8 + 7 =
QUESTION ON "Perl"

EMSDN.COM