: fix: Pattern Match fails for specific length string
8 answers - 1984 bytes -

6/23/06, via RT Erik R. <perlbug-followup (AT) perl (DOT) orgwrote:
# New Ticket Created by "Erik R. "
# Please include the string: [perl #39583]
# in the subject line of all future correspondence about this issue.
# <URL: >
--
PGP SIGNED MESSAGE
Hash: SHA1
This is a bug report for perl from erik (AT) cloudshield (DOT) com,
generated with the help of perlbug 1.35 running under perl v5.8.5.
--
-
[Please enter your report here]
*NTE*NTE* in spite of using the perlbug installed from an RPM,
I've also reproduced this with locally built versions of v5.8.8 and
v5.9.3.
The following script produces an unexpected result. Changing the
value of $size ((empirical evidence shows) to ANY other value)
returns the expected behavior.
-
#!/usr/bin/perl
# The printing characters
my @chars = ("\n", "\t", map {chr} 0400177);
# \376 also works, I haven't tried other values
my $delim = "\0";
# add a +1 or -1 (or change to any other value) to make this succeed.
my $size = 32771 - 4;
my $test = '';
# create some random junk. Inefficient, but it works.
for ($i = 0 ; $i < $size ; $i++) {
$test .= $chars[int(rand(@chars))];
}
$test .= ($delim x 4);
$test =~ s/^(.*?)${delim}{4}//s;
print "Should be empty: $test\n\n";
print "Should be 0: ", length($test), "\n";
print "Should be $size: ", length($1), "\n";
Attached patch fixes the problem.
We should probably convert the sample code to a test. Attached is a
modified version of the P's test to use. I changed it to stay in the
"visible" range of characters as the re debug code doesnt play nicely
with control chars and nulls so you cant see whats going on. Which in
itself should be a TD as it should be possible to debug such cases.
Sorry i dont have time to do the test part right now.
cheers,
Yves
No.1 | | 473 bytes |
| 
demerphq wrote:
Attached patch fixes the problem.
Thanks, applied as change #28417.
We should probably convert the sample code to a test. Attached is a
modified version of the P's test to use. I changed it to stay in the
"visible" range of characters as the re debug code doesnt play nicely
with control chars and nulls so you cant see whats going on. Which in
itself should be a TD as it should be possible to debug such cases.
No.2 | | 620 bytes |
| 
6/23/06, Rafael Garcia-Suarez <rgarciasuarez (AT) mandriva (DOT) comwrote:
demerphq wrote:
Attached patch fixes the problem.
Thanks, applied as change #28417.
We should probably convert the sample code to a test. Attached is a
modified version of the P's test to use. I changed it to stay in the
"visible" range of characters as the re debug code doesnt play nicely
with control chars and nulls so you cant see whats going on. Which in
itself should be a TD as it should be possible to debug such cases.
Patch to add a test for this bug is attached.
Cheers,
Yves
No.3 | | 770 bytes |
| 
Sun, Jul 02, 2006 at 03:13:20PM +0200, demerphq wrote:
6/23/06, Rafael Garcia-Suarez <rgarciasuarez (AT) mandriva (DOT) comwrote:
>demerphq wrote:
>Attached patch fixes the problem.
>
>Thanks, applied as change #28417.
>
>We should probably convert the sample code to a test. Attached is a
>modified version of the P's test to use. I changed it to stay in the
>"visible" range of characters as the re debug code doesnt play nicely
>with control chars and nulls so you cant see whats going on. Which in
>itself should be a TD as it should be possible to debug such cases.
>
Patch to add a test for this bug is attached.
thanks, applied as change #28462.
No.4 | | 1183 bytes |
| 
7/2/06, Dave Mitchell <davem (AT) iabyn (DOT) comwrote:
Sun, Jul 02, 2006 at 03:13:20PM +0200, demerphq wrote:
6/23/06, Rafael Garcia-Suarez <rgarciasuarez (AT) mandriva (DOT) comwrote:
>demerphq wrote:
>Attached patch fixes the problem.
>
>Thanks, applied as change #28417.
>
>We should probably convert the sample code to a test. Attached is a
>modified version of the P's test to use. I changed it to stay in the
>"visible" range of characters as the re debug code doesnt play nicely
>with control chars and nulls so you cant see whats going on. Which in
>itself should be a TD as it should be possible to debug such cases.
>
>
Patch to add a test for this bug is attached.
thanks, applied as change #28462.
Thanks.
Attached is a patch to resolve the issue of escaped chars in the
string being matched. Its maybe a little crude, but according to
chatter on #p5p there isnt a generalized routine for this type of
purpose. (I might follow up with a patch creating such a routine but
for now this is useful.)
Cheers,
yves
No.5 | | 433 bytes |
| 
demerphq wrote:
Attached is a patch to resolve the issue of escaped chars in the
string being matched. Its maybe a little crude, but according to
chatter on #p5p there isnt a generalized routine for this type of
purpose. (I might follow up with a patch creating such a routine but
for now this is useful.)
If I've read my backlog correctly, using pv_display might be a better
approach there, right ?
No.6 | | 1428 bytes |
| 
7/3/06, Rafael Garcia-Suarez <rgarciasuarez (AT) mandriva (DOT) comwrote:
demerphq wrote:
Attached is a patch to resolve the issue of escaped chars in the
string being matched. Its maybe a little crude, but according to
chatter on #p5p there isnt a generalized routine for this type of
purpose. (I might follow up with a patch creating such a routine but
for now this is useful.)
If I've read my backlog correctly, using pv_display might be a better
approach there, right ?
Actually, as the Germans would say "jein". :-)
pv_display contains very similar code, but it has the annoyance that
it puts quotes on, and im not certain about whether its appropriate in
terms of length handling (As i only found out about pv_display() this
morning i didnt have time to look into it properly.). Which makes me
think it needs to be refactored so that the current implementation is
a wrapper around a core routine more like the one I posted seperately.
Then the regexec.c could use the core routine.
But until then IM the patch to regexec.c should go in as it does no
harm, and does do good. (Perfect being the enemy of good.) At the
very least it will make it easier to look into bugreports involving
strings with control chars in them. Also, once the new routine is
created its a pretty straightforward delete/replace to switch over.
Yves
No.7 | | 1186 bytes |
| 
7/3/06, demerphq <demerphq (AT) gmail (DOT) comwrote:
7/3/06, Rafael Garcia-Suarez <rgarciasuarez (AT) mandriva (DOT) comwrote:
demerphq wrote:
Attached is a patch to resolve the issue of escaped chars in the
string being matched. Its maybe a little crude, but according to
chatter on #p5p there isnt a generalized routine for this type of
purpose. (I might follow up with a patch creating such a routine but
for now this is useful.)
If I've read my backlog correctly, using pv_display might be a better
approach there, right ?
Actually, as the Germans would say "jein". :-)
pv_display contains very similar code, but it has the annoyance that
it puts quotes on, and im not certain about whether its appropriate in
terms of length handling (As i only found out about pv_display() this
morning i didnt have time to look into it properly.). Which makes me
think it needs to be refactored so that the current implementation is
a wrapper around a core routine more like the one I posted seperately.
Then the regexec.c could use the core routine.
And here it is. (Requires regen.pl)
Cheers,
Yves
No.8 | | 627 bytes |
| 
demerphq wrote:
pv_display contains very similar code, but it has the annoyance that
it puts quotes on, and im not certain about whether its appropriate in
terms of length handling (As i only found out about pv_display() this
morning i didnt have time to look into it properly.). Which makes me
think it needs to be refactored so that the current implementation is
a wrapper around a core routine more like the one I posted seperately.
Then the regexec.c could use the core routine.
And here it is. (Requires regen.pl)
Thanks, applied as change #28490 (with a bit of reindentation)