Perl

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • perlbug AutoReply: UTF-16 and regular expressionscauses compilation failure

    6 answers - 1649 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    My appologies - where I wrote utf.pm, I should have written utf8.pm.
    Running under the Perl debugger with trace on, ends as follows:
    main::(test4.pl:17): $text =~ m![t]*!;
    main::CDE(0x95c3d68)(test4.pl:17):
    17: $text =~ m![t]*!;
    Encode::find_encoding(/):
    121: my ($name, $skip_external) = @_;
    Encode::find_encoding(/):
    122: return __PACKAGEgetEncoding($name,$skip_external);
    Encode::getEncoding(/):
    96: my ($class, $name, $skip_external) = @_;
    Encode::getEncoding(/):
    98: ref($name) && $name->can('renew') and return $name;
    Encode::getEncoding(/):
    99: exists $Encoding{$name} and return $Encoding{$name};
    Encode::Unicode::renew(/ 6):
    46: my $self = shift;
    Encode::Unicode::renew(/ 7):
    47: $BM_Unknown{$self->name} or return $self;
    Encode::Encoding::name(/ 18):
    18: sub name { return shift->{'Name'} }
    Encode::Unicode::renew(/ 8):
    48: my $clone = bless { %$self } =ref($self);
    Encode::Unicode::renew(/ 9):
    49: $clone->{renewed}++; # so the caller knows it is renewed.
    Encode::Unicode::renew(/ 0):
    50: return $clone;
    Encode::Encoding::needs_lines(/ ing.pm:34):
    34: sub needs_lines { 0 };
    Encode::Encoding::DESTRY(/ pm:64):
    64: sub DESTRY {}
    UTF-16:Unrecognised BM 7061, <INline 1.
    at / line 0
    require utf8.pm called at test4.pl line 17
    main::BEGIN() called at / line 0
    eval {} called at / line 0
    Compilation failed in require at test4.pl line 17, <INline 1.
    BEGIN failed aborted, <INline 1.
    at test4.pl line 0
    Debugged program terminated. Use q to quit or R to restart,
  • No.1 | | 1017 bytes | |

    Sun, 20 Aug 2006 15:23:11 +1200, <ian.goodacre (AT) xtra (DOT) co.nzwrote

    My appologies - where I wrote utf.pm, I should have written utf8.pm.

    Running under the Perl debugger with trace on, ends as follows:

    UTF-16:Unrecognised BM 7061, <INline 1.
    at / line 0
    require utf8.pm called at test4.pl line 17
    main::BEGIN() called at / line 0
    eval {} called at / line 0
    Compilation failed in require at test4.pl line 17, <INline 1.
    BEGIN failed aborted, <INline 1.
    at test4.pl line 0
    Debugged program terminated. Use q to quit or R to restart,

    As loading utf8.pm for utf8::SWASHNEW() in utf8_heavy.pl is necessary
    to compile a Unicode character class, swash_init() in utf8.c tried to
    load utf8.pm if SWASHNEW is not defined, but (perhaps) wrongly assumed
    it to be encoded in UTF-16, and then failed in loading it.

    It seems this is fixed in Perl 5.9.4, though I'm not sure what
    changed it.

    Regarads,
    SADAHIR Tomoyuki
  • No.2 | | 1393 bytes | |

    Sun, 20 Aug 2006 19:50:07 +0900, SADAHIR Tomoyuki <bqw10602 (AT) nifty (DOT) comsaid:

    Sun, 20 Aug 2006 15:23:11 +1200, <ian.goodacre (AT) xtra (DOT) co.nzwrote

    >My appologies - where I wrote utf.pm, I should have written utf8.pm.
    >
    >Running under the Perl debugger with trace on, ends as follows:
    >


    >UTF-16:Unrecognised BM 7061, <INline 1.
    >at / line 0
    >require utf8.pm called at test4.pl line 17
    >main::BEGIN() called at / line 0
    >eval {} called at / line 0
    >Compilation failed in require at test4.pl line 17, <INline 1.
    >BEGIN failed aborted, <INline 1.
    >at test4.pl line 0
    >Debugged program terminated. Use q to quit or R to restart,


    As loading utf8.pm for utf8::SWASHNEW() in utf8_heavy.pl is necessary
    to compile a Unicode character class, swash_init() in utf8.c tried to
    load utf8.pm if SWASHNEW is not defined, but (perhaps) wrongly assumed
    it to be encoded in UTF-16, and then failed in loading it.

    It seems this is fixed in Perl 5.9.4, though I'm not sure what
    changed it.

    I suspect

    Subject: [PATCH] optimize /[x]/ to /x/.
    From: demerphq <demerphq (AT) gmail (DOT) com>
    Date: Sat, 20 May 2006 23:16:33 +0200
    Message-Id: <@mail.gmail.com>
  • No.3 | | 1672 bytes | |

    8/21/06, Andreas J. Koenig <andreas.koenig.gmwojprw (AT) franz (DOT) ak.mind.dewrote:
    Sun, 20 Aug 2006 19:50:07 +0900, SADAHIR Tomoyuki <bqw10602 (AT) nifty (DOT) comsaid:

    Sun, 20 Aug 2006 15:23:11 +1200, <ian.goodacre (AT) xtra (DOT) co.nzwrote
    >
    >My appologies - where I wrote utf.pm, I should have written utf8.pm.
    >>

    >Running under the Perl debugger with trace on, ends as follows:
    >>

    >
    >UTF-16:Unrecognised BM 7061, <INline 1.
    >at / line 0
    >require utf8.pm called at test4.pl line 17
    >main::BEGIN() called at / line 0
    >eval {} called at / line 0
    >Compilation failed in require at test4.pl line 17, <INline 1.
    >BEGIN failed aborted, <INline 1.
    >at test4.pl line 0
    >Debugged program terminated. Use q to quit or R to restart,
    >

    As loading utf8.pm for utf8::SWASHNEW() in utf8_heavy.pl is necessary
    to compile a Unicode character class, swash_init() in utf8.c tried to
    load utf8.pm if SWASHNEW is not defined, but (perhaps) wrongly assumed
    it to be encoded in UTF-16, and then failed in loading it.

    It seems this is fixed in Perl 5.9.4, though I'm not sure what
    changed it.

    I suspect

    Subject: [PATCH] optimize /[x]/ to /x/.
    From: demerphq <demerphq (AT) gmail (DOT) com>
    Date: Sat, 20 May 2006 23:16:33 +0200
    Message-Id: <@mail.gmail.com>

    Yeah, that probably did it. The thing is, id expect the bug to return
    if instead of

    $text =~ m![t]*!;

    we use

    $text =~ m![tT]*!;
  • No.4 | | 1850 bytes | |

    8/21/06, demerphq <demerphq (AT) gmail (DOT) comwrote:
    8/21/06, Andreas J. Koenig <andreas.koenig.gmwojprw (AT) franz (DOT) ak.mind.dewrote:
    Sun, 20 Aug 2006 19:50:07 +0900, SADAHIR Tomoyuki <bqw10602 (AT) nifty (DOT) comsaid:

    Sun, 20 Aug 2006 15:23:11 +1200, <ian.goodacre (AT) xtra (DOT) co.nzwrote
    >
    >My appologies - where I wrote utf.pm, I should have written utf8.pm.
    >>

    >Running under the Perl debugger with trace on, ends as follows:
    >>

    >
    >UTF-16:Unrecognised BM 7061, <INline 1.
    >at / line 0
    >require utf8.pm called at test4.pl line 17
    >main::BEGIN() called at / line 0
    >eval {} called at / line 0
    >Compilation failed in require at test4.pl line 17, <INline 1.
    >BEGIN failed aborted, <INline 1.
    >at test4.pl line 0
    >Debugged program terminated. Use q to quit or R to restart,
    >

    As loading utf8.pm for utf8::SWASHNEW() in utf8_heavy.pl is necessary
    to compile a Unicode character class, swash_init() in utf8.c tried to
    load utf8.pm if SWASHNEW is not defined, but (perhaps) wrongly assumed
    it to be encoded in UTF-16, and then failed in loading it.

    It seems this is fixed in Perl 5.9.4, though I'm not sure what
    changed it.

    I suspect

    Subject: [PATCH] optimize /[x]/ to /x/.
    From: demerphq <demerphq (AT) gmail (DOT) com>
    Date: Sat, 20 May 2006 23:16:33 +0200
    Message-Id: <@mail.gmail.com>

    Yeah, that probably did it. The thing is, id expect the bug to return
    if instead of

    $text =~ m![t]*!;

    we use

    $text =~ m![tT]*!;

    And a quick check shows the bug is truely gone. Which means my patch
    wasnt involved

    Cheers,
    Yves
  • No.5 | | 1809 bytes | |

    Mon, 21 Aug 2006 09:53:32 +0200, demerphq wrote
    8/21/06, demerphq <demerphq (AT) gmail (DOT) comwrote:
    8/21/06, Andreas J. Koenig <andreas.koenig.gmwojprw (AT) franz (DOT) ak.mind.dewrote:
    Sun, 20 Aug 2006 19:50:07 +0900, SADAHIR Tomoyuki <bqw10602 (AT) nifty (DOT) comsaid:

    As loading utf8.pm for utf8::SWASHNEW() in utf8_heavy.pl is necessary
    to compile a Unicode character class, swash_init() in utf8.c tried to
    load utf8.pm if SWASHNEW is not defined, but (perhaps) wrongly assumed
    it to be encoded in UTF-16, and then failed in loading it.

    It seems this is fixed in Perl 5.9.4, though I'm not sure what
    changed it.

    I suspect

    Subject: [PATCH] optimize /[x]/ to /x/.
    From: demerphq <demerphq (AT) gmail (DOT) com>
    Date: Sat, 20 May 2006 23:16:33 +0200
    Message-Id: <@mail.gmail.com>

    Yeah, that probably did it. The thing is, id expect the bug to return
    if instead of

    $text =~ m![t]*!;

    we use

    $text =~ m![tT]*!;

    And a quick check shows the bug is truely gone. Which means my patch
    wasnt involved

    I think something other did it, too, since this issue isn't limited
    to /[ ]/. As commented in utf8_heavy.pl,
    cf.

    instead of $text =~ m![t]*!,
    (1) case-insensitive matching, like $text =~ /t/i
    (2) case mapping operation for utf8, like $text = uc($text)
    (3) transliteration for utf8, like $text =~ tr/t\x{100}//
    (4) matching unicode properties, like $text =~ /\p{Latin}/
    also trigger internal calling utf8::SWASHNEW.
    And in all cases the bug seems to be gone.

    Probably the change of $^H and %^H might concern it, though I must
    admit I'm not sure how open.pm does with load_module().

    Regards,
    SADAHIR Tomoyuki
  • No.6 | | 734 bytes | |

    Mon, 21 Aug 2006 09:53:32 +0200, demerphq <demerphq (AT) gmail (DOT) comsaid:

    And a quick check shows the bug is truely gone. Which means my patch
    wasnt involved

    Yeah, sorry, I was running out of time and binarysearch was close so I
    couldn't resist guessing. Now binary search has completed and I see it
    was

    Change 28258 by nicholas@nicholas-saigo on 2006/05/20 17:29:52

    Abolish cop_io (the simple way) by storing the value in cop_hints_hash.
    Todo - store the in and out values under 2 keys, and avoid the need to
    create a temporary mortal SV while checking it.

    This time this is the result of binsearchaperl, so no guesswork involved.

    Sorry again for the confusion,

Re: perlbug AutoReply: UTF-16 and regular expressionscauses compilation failure


max 4000 letters.
Your nickname that display:
In order to stop the spam: 0 + 9 =
QUESTION ON "Perl"

EMSDN.COM