New: codecvt locale facet is broken (reproducible crash)
22 answers - 2080 bytes -

The attached source file (UTF-8 encoded) demonstrates that codecvt
is broken for the simplest of transformations (UTF-8 to UCS-4).
This is pretty basic, and the underlying gconf stuff works correctly,
so the bug is either in libstdc++6 or somewhere inline in the headers.
$ ./wide
wide: /iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr -
bytebuf (statecount & 7)' failed.
Aborted
While running:
(gdb) bt
#0 0x0fcc672c in () from /lib/tls/libc.so.6
#1 0x0fe0425c in ? () from /lib/tls/libc.so.6
#2 0x0ffa6ef8 in std::codecvt<wchar_t, char, __mbstate_t>::do_in ()
from /usr/lib/libstdcso.6
#3 0x100016b4 in std::__codecvt_abstract_base<wchar_t, char, __mbstate_t>::in
(this=0x100290b8, __state=@0x7fa405a8, __from=0x10013014
"ESC%GESC%@2
37",
__from_end=0x1001301d "", __from_next=@0x7fa405b0, __to=0x7fa405bc,
__to_end=0x7fa406fc, __to_next=@0x7fa405b4)
at
/
odecvt.h:204
#4 0x10001244 in to_wide_string (str=@0x7fa40758, locale=@0x7fa40738)
at wide.cc:22
#5 0x10001544 in main () at wide.cc:59
Program received signal SIGABRT, Aborted.
0x0fcd67bc in raise () from /lib/tls/libc.so.6
(gdb) bt
#0 0x0fcd67bc in raise () from /lib/tls/libc.so.6
#1 0x0fcd82c0 in abort () from /lib/tls/libc.so.6
#2 0x0fcce768 in __assert_fail () from /lib/tls/libc.so.6
#3 0x0fcc6c7c in () from /lib/tls/libc.so.6
#4 0x0fcc6c7c in () from /lib/tls/libc.so.6
#5 0x0fcc6c7c in () from /lib/tls/libc.so.6
#6 0x0fcc6c7c in () from /lib/tls/libc.so.6
#7 0x0fcc6c7c in () from /lib/tls/libc.so.6
#8 0x0fcc6c7c in () from /lib/tls/libc.so.6
#9 0x0fcc6c7c in () from /lib/tls/libc.so.6
#10 0x0fcc6c7c in () from /lib/tls/libc.so.6
#11 0x0fcc6c7c in () from /lib/tls/libc.so.6
Previous frame inner to this frame (corrupt stack?)
It affects GCC 4.2 (20060613), 4.1, 4.0, 3.3
on Debian GNU/Linux (unstable).
The program works correctly with 3.4:
$ g3.4 -o wide wide.cc
$ ./wide
1
$
Regards,
Roger
No.1 | | 586 bytes |
| 
Comment #2 from pcarlini at suse dot de 2006-06-16 13:30
Humm, this is really puzzling because nothing non-trivial changed in that area
going from 3.4 to 4.0 and of course we all run daily the testsuite which
includes quite a few codecvt tests, which always pass smoothly. Could you
please compare/contrast your issue to existing testcases in
testsuite/22_locale/codecvt?
Anyway, if I save the attached wide.cc from the browser and compile/run it,
then I get "1 4 1 4" without end. Is that the expected result? can you
help us reproduce the problem? Thanks,
No.2 | | 311 bytes |
| 
Comment #11 from pcarlini at suse dot de 2006-06-16 14:20
(In reply to comment #9)
Humm, wait, I'm working on x86-linux! Is that target specific? You can see the
issue only on powerpc?
Well, in any case all the codecvt regression tests are always fine on powerpc
and powerpc64-linux too
No.3 | | 287 bytes |
| 
Comment #10 from rleigh at debian dot org 2006-06-16 14:20
Yes, this is all on the same Debian installation. 3.3, 3.4, 4.0, 4.1 and 4.2
(snapshot) are available. All but 3.4 exhibit this problem.
I will test on an i686 system in a moment to check if it's powerpc-only.
No.4 | | 224 bytes |
| 
Comment #14 from pcarlini at suse dot de 2006-06-16 15:09
Can you please tell us the glibc version? I'm asking because I can reproduce on
an ia64 machine using glibc2.4, not on all the glibc2.3.6 systems I tried.
No.5 | | 182 bytes |
| 
Comment #1 from rleigh at debian dot org 2006-06-16 13:09
Created an attachment (id=11679)
()
Testcase to show codecvt crash
Compile with
g++ -o wide wide.cc
No.6 | | 508 bytes |
| 
Comment #4 from pcarlini at suse dot de 2006-06-16 13:49
(In reply to comment #3)
The source is UTF-8 encoded, and it assumes you are going to run it in a UTF-8
locale. That might possibly be why you get odd output.
The expected output should be as per the GCC 3.4 output in the original report:
$ g3.4 -o wide wide.cc
$ ./wide
1
$
, thanks. Then I used the "en_US.UTF-8" locale and it worked fine, both
mainline and stock 4.1.1: no crashes, apparently same output.
No.7 | | 963 bytes |
| 
Comment #15 from rleigh at debian dot org 2006-06-16 16:16
$ uname -a
Linux hardknott 2.6.16.17 #7 Sun May 21 15:39:23 BST 2006 ppc GNU/Linux
$ /lib/libc.so.6
GNU C Library stable release version 2.3.6, by Roland McGrath et al.
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is N warranty; not even for MERCHANTABILITY or FITNESS FR A
PARTICULAR PURPSE.
Compiled by GNU CC version 4.0.4 20060507 (prerelease) (Debian 4.0.3-3).
Compiled on a Linux 2.6.13 system on 2006-06-08.
Available extensions:
GNU libio by Per Bothner
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
linuxthreads-0.10 by Xavier Leroy
BIND-8.2.3-T5B
libthread_db work sponsored by Alpha Processor Inc
NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
software FPU emulation by Richard Henderson, Jakub Jelinek and others
No.8 | | 207 bytes |
| 
Comment #16 from pcarlini at suse dot de 2006-06-16 16:56
I can reproduce on an ia64-linux machine, so confirmed, but very puzzling on
the libstdcv3 side, no idea how/when we are going to deal with it
No.9 | | 1616 bytes |
| 
Comment #17 from rleigh at debian dot org 2006-06-16 16:59
Created an attachment (id=11682)
()
Use mbsnrtowcs directly.
This testcase is similar to the original, with the exception that it uses
mbsnrtowcs in place of the codecvt locale facet. It also initialises the
locale with setlocale() for LC_CTYPE.
It shows some interesting results, in fact the exact opposite of the original
testcase:
GCC ver powerpc i386
3.3 fail fail
3.4 K K
4.0 K fail
4.1 K fail
4.2 K fail
With this test, the expected output is this:
$ ./wide2
1
The output for the failed tests:
GCC 3.3:
powerpc (GCC 3.3 was bad at wide streams; the output is "lost"):
$ ./wide2
1
i386:
$ ./wide2
wide2: /iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr -
bytebuf (statecount & 7)' failed.
Aborted
GCC 4.0/i386:
$ ./wide2
wide2: /iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr -
bytebuf (statecount & 7)' failed.
Aborted
GCC 4.1/i386:
./wide2
wide2: /iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr -
bytebuf (statecount & 7)' failed.
Aborted
GCC 4.2/i386:
$ ./wide2
wide2: /iconv/loop.c:425: utf8_internal_loop_single: Assertion `inptr -
bytebuf (statecount & 7)' failed.
Aborted
Please do allow for the fact that one (or both) of these testcases might be
buggy; I've never used these interfaces before. However the behaviour is
still highly variable between the two platforms.
Regards,
Roger
No.10 | | 252 bytes |
| 
Comment #18 from pcarlini at suse dot de 2006-06-16 17:03
, thanks. Before I go completely crazy, let's agree at least about a detail:
let's not involve 3.3: in 3.3 codecvt is known to be broken and was completely
rewritten for 3.4.
No.11 | | 302 bytes |
| 
--
pcarlini at suse dot de changed:
What |Removed |Added
AssignedTo|unassigned at gcc dot gnu |pcarlini at suse dot de
|dot org |
Status|WAITING |ASSIGNED
Ever Confirmed|0 |1
Last reconfirmed|2006-06-16 16:56:58 |2006-06-16 17:03:44
date| |
No.12 | | 200 bytes |
| 
--
pcarlini at suse dot de changed:
What |Removed |Added
AssignedTo|pcarlini at suse dot de |unassigned at gcc dot gnu
| |dot org
Status|ASSIGNED |NEW
No.13 | | 1360 bytes |
| 
Comment #19 from rleigh at debian dot org 2006-06-16 17:26
Created an attachment (id=11683)
()
C example using mbsnrtowcs
This testcase is the same as the last, but uses C only.
It looks like this:
GCC ver powerpc i386
3.3 K K
3.4 K K
4.0 K fail
4.1 K fail
4.2 K fail
The expected output is:
$ ./wide3
1
i386 (all failing versions):
$ ./wide3
Segmentation fault
(gdb) run
Starting program: /home/rleigh/wide3
Program received signal SIGSEGV, Segmentation fault.
0xa7e0e19d in (step=0x805ede0,
data=0xafc2a8d0, inptrp=0xafc2aa80, inend=0x8048754 "", outbufstart=0x0,
irreversible=0xafc2a8f8, do_flush=0, consume_incomplete=1)
at /iconv/loop.c:371
371 /iconv/loop.c: No such file or directory.
in /iconv/loop.c
(gdb) bt
#0 0xa7e0e19d in (step=0x805ede0,
data=0xafc2a8d0, inptrp=0xafc2aa80, inend=0x8048754 "", outbufstart=0x0,
irreversible=0xafc2a8f8, do_flush=0, consume_incomplete=1)
at /iconv/loop.c:371
#1 0xa7e65bd9 in __mbsnrtowcs (dst=0xafc2a93c, src=0xafc2aa80, nmc=9,
len=162, ps=0xafc2aa84) at mbsnrtowcs.c:106
#2 0x08048503 in print_wide (str=0x804874b "�237") at wide3.c:16
#3 0x080485f0 in main () at wide3.c:40
Both the powerpc and i386 system are running the same version of glibc.
No.14 | | 260 bytes |
| 
Comment #20 from rleigh at debian dot org 2006-06-16 17:28
Before I go completely crazy, let's agree at least about a detail:
let's not involve 3.3: in 3.3 codecvt is known to be broken and was
completely rewritten for 3.4.
Agreed :)
No.15 | | 716 bytes |
| 
Comment #21 from pcarlini at suse dot de 2006-06-16 18:10
, I think I have something meaningful to say: seems definitely a
miscompilation. I would ask you to check on powerpc-linux what I'm seeing on
ia64-linux: the problem goes away if I both build libstdc++ and eventually the
testcase at " -g3". Therefore I would ask you to go inside the libstdcv3
dir of your build tree, do a make clean ; make CXXFLAGS=" -g3", reinstall
the library alone (no need to rebuild the compiler proper) and build the
testcase itself " -g3". ia64-linux the problem goes away. If yoy can
confirm, the difficult part begins ;) because we are supposed to prepare a
reduced testcase for the compiler people
No.16 | | 418 bytes |
| 
Comment #22 from rleigh at debian dot org 2006-06-16 18:19
Just to summarise the current tests:
wide wide2 wide3
GCC ver ppc i386 ppc i386 ppc i386
3.4 K K K K K fail
4.0 fail K K fail K fail
4.1 fail K K fail K fail
4.2 fail K K fail K fail
GCC 3.4 is the most reliable, but I don't understand the pattern of failures.
I'll do a build in a moment as you suggest.
No.17 | | 190 bytes |
| 
Comment #23 from rleigh at debian dot org 2006-06-17 14:29
This will take a few more hours. I didn't have a built GCC tree to hand, so
I'm still waiting on "make bootstrap".
No.18 | | 2730 bytes |
| 
Comment #24 from rleigh at debian dot org 2006-06-18 00:27
/gcc-20060613/configure ,c++
/home/rleigh/gcc-test
$ ./wide
terminate called after throwing an instance of 'std::runtime_error'
what(): name not valid
Aborted
#0 0x0fcf77c8 in kill () at /string/bits/string2.h:998
#1 0x0fcf754c in GI_raise (sig=6) at
#2 0x0fcf8e68 in GI_abort () at /sysdeps/generic/abort.c:88
#3 0x0ffb273c in () at
#4 0x0ffaf87c in __cxxabiv1::__terminate (handler=0) at
#5 0x0ffaf8b8 in std::terminate () at
#6 0x0ffafa20 in __cxa_throw (obj=<value optimized out>, tinfo=<value
optimized out>, dest=<value optimized out>)
at
#7 0x0ff3a050 in std::__throw_runtime_error (__s=<value optimized out>) at
#8 0x0ffadd64 in (__cloc=<value
optimized out>, __s=<value optimized out>) at c++locale.cc:141
#9 0x0ff40154 in _Impl (this=0x10013080, __s=0x6 <Address 0x6 out of bounds>,
__refs=<value optimized out>)
at
#10 0x0ff41ac4 in locale (this=0x7fc83950, __s=<value optimized out>) at
#11 0x100015e8 in main () at wide.cc:54
$ ./wide2
1
./wide3
1
Rebuilding libstdc++v3 with 'make CXXFLAGS=" -g3"':
$ ./wide
terminate called after throwing an instance of 'std::runtime_error'
what(): name not valid
Aborted
(gdb) run
Starting program: /home/rleigh/wbug/wide
terminate called after throwing an instance of 'std::runtime_error'
what(): name not valid
Program received signal SIGABRT, Aborted.
0x0fcc57c8 in kill () at /string/bits/string2.h:998
998 /string/bits/string2.h: No such file or directory.
in /string/bits/string2.h
Current language: auto; currently c
(gdb) bt
#0 0x0fcc57c8 in kill () at /string/bits/string2.h:998
#1 0x0fcc554c in GI_raise (sig=6) at
#2 0x0fcc6e68 in GI_abort () at /sysdeps/generic/abort.c:88
#3 0x0ffaf7d4 in () at
#4 0x0ffaa238 in __cxxabiv1::__terminate (handler=0xffaf5ac
<()>)
at
#5 0x0ffaa288 in std::terminate () at
#6 0x0ffaa534 in __cxa_throw (obj=0x10013130, tinfo=0xffe2d58, dest=0xff1ea3c
<~runtime_error>)
at
#7 0x0ff120e4 in std::__throw_runtime_error (__s=0xffb7e04
" name not valid")
at
#8 0x0ffa7624 in (__cloc=@0x7fd11824,
__s=0x1001306c "en_GB.UTF8") at c++locale.cc:141
#9 0x0ff1bda4 in _Impl (this=0x10013080, __s=0x1001306c "en_GB.UTF8",
__refs=1) at
#10 0x0ff1de70 in locale (this=0x7fd11950, __s=0x10002364 "") at
#11 0x10001748 in main () at wide.cc:54
$ ./wide2
1
$ ./wide3
1
Regards,
Roger
No.19 | | 943 bytes |
| 
Comment #25 from pcarlini at suse dot de 2006-06-18 09:35
(In reply to comment #24)
terminate called after throwing an instance of 'std::runtime_error'
what(): name not valid
This is the standard throw which happens when a named locale cannot be used,
has nothing to do with the issue which we are discussing and it's expexted
behavior. The only possible explanation is that the GNU locale model has been
disabled by the configure-time tests. Do you have installed a full set of
locales, in particular de_DE? See also these notes for additional details:
Anyway, at this point it's almost sure we are dealing with a miscompilation,
the fact that nothing changed in the libary code and the problem happen with
the 4.x compilers (of new technology, ssa, etc) it's also a strong indication
of that (besides my 100% reproducible tests on ia64-linux and all the other
checks).
No.20 | | 560 bytes |
| 
Comment #26 from rleigh at debian dot org 2006-06-18 09:51
Thiemo Seufer diagnosed this as a problem with the testcases: mbstate_t needs
explictly initialising to all-bits-zero with memset. After doing this
std::memset(&state, 0, sizeof(mbstate_t));
all the testcases work for me on powerpc and i386.
Since this is not a bug, it can be closed. Sorry about that. Perhaps the
libstdc++ doxygen documentation for codecvt could document that
state_type/mbstate_t needs explicit initialisation before use.
Regards,
Roger
No.21 | | 876 bytes |
| 
Comment #27 from pcarlini at suse dot de 2006-06-18 10:03
(In reply to comment #26)
Thiemo Seufer diagnosed this as a problem with the testcases: mbstate_t needs
explictly initialising to all-bits-zero with memset. After doing this
std::memset(&state, 0, sizeof(mbstate_t));
all the testcases work for me on powerpc and i386.
Funny. Actually, we still have bugs, in the testsuite only , where we are never
doing the initialization. I will fix that. Sorry about my part of the waste of
time, I'm learning some of those details with you, the current codecvt has been
contributed by other people.
Since this is not a bug, it can be closed. Sorry about that. Perhaps the
libstdc++ doxygen documentation for codecvt could document that
state_type/mbstate_t needs explicit initialisation before use.
Regards,
Roger
No.22 | | 139 bytes |
| 
Comment #28 from pcarlini at suse dot de 2006-06-18 10:13
Correction, our testcases are already fine, zero_state does the job
Anyway