BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:
I've discovered a peculiarity, which may be a bug but I'm not sure yet. It affects AS 802 (5.8.0), but not AS 633.
If you try to use an escaped # in a regex using qr//x, it embeds a newline in the compiled regex. As you can see, I tried various methods of escaping the # to no avail.
#! perl -slw
use strict;
my $re_a = qr[\w+\#];
my $re_b = qr[\w+ \# ]x;
my $re_c = qr[\w+ [#] ]x;
my $re_d = qr[\w+ \Q#\E ]x;
print $re_a;
print $re_b;
print $re_c;
print $re_d;
__END__
(?-xism:\w+\#)
(?x-ism:\w+ \#
)
(?x-ism:\w+ [#]
)
(?x-ism:\w+ \#\\E\
)
I think this is a bug, as it appears to be trying to match a \n at that position, but I'm having trouble confirming this. Any ideas how to verify this is the case?
Also, does this affect non-AS builds of 5.8?
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
Re: qr//x and \# weirdness (AS 5.8?)
by Chmrr (Vicar) on May 28, 2003 at 05:28 UTC
|
/*
* If /x was used, we have to worry about a regex
* ending with a comment later being embedded
* within another regex. If so, we don't want this
* regex's "commentization" to leak out to the
* right part of the enclosing regex, we must cap
* it with a newline.
*
* So, if /x was used, we scan backwards from the
* end of the regex. If we find a '#' before we
* find a newline, we need to add a newline
* ourself. If we find a '\n' first (or if we
* don't find '#' or '\n'), we don't need to add
* anything. -jfriedl
*/
This is only a display bug -- it doesn't effect the way the regex matches.
It looks like it's not looking earlier than the # to see if the # is preceeded by a \. Unfortunately, I don't think it's just that easy, either, as qr[ \\# ] should get the newline..
perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER' | [reply] [d/l] [select] |
|
@ 2992 sv.c (5.8.0)
if (PMf_EXTENDED & re->reganch)
{
char *endptr = re->precomp + re->prelen;
while (endptr >= re->precomp)
{
char c = *(endptr--);
if (c == '\n')
break; /* don't need another */
if (c == '#') {
+ int n =0;
+ while( endptr >= re->precomp &&
+ c = *(endptr--) &&
+ c == '\' ) n++;
+ /* if we've an odd number of backslashes the #
+ is escaped, so don't need the newline */
+
+ if ( n & 1) break;
/* we end while in a comment, so we
need a newline */
mg->mg_len++; /* save space for it */
need_newline = 1; /* note to add it */
}
}
}
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
| [reply] [d/l] |
Re: qr//x and \# weirdness (AS 5.8?)
by Enlil (Parson) on May 28, 2003 at 04:59 UTC
|
as it appears to be trying to match a \n at that position
Well here is what it does appear to match on AS Perl 5.8 and on Perl 5.8 built from source on RH Linux(with a little help of use re 'debug'): #! perl -slw
use strict;
#use re 'debug';
my $re_b = qr[\w+ #]x;
my $re_c = qr[\w+ [#] ]x;
my $re_d = qr[\w+ \Q#\E ]x;
print $re_b if "foo" =~ $re_b; #matches /\w/
print $re_c if "foo#" =~ $re_c; #matches /\w+#/
print $re_d if "fldkdafds#\\E " =~ $re_d; #matches /\w+#\\E /;
-enlil
| [reply] [d/l] [select] |
|
Okay:) So the problem with my regex (the bigger original one where I discovered this) not matching is probably not to do with this peculiarity, and I was just grabbing a straw. But it was a pretty good straw:)
That the \E in \Q#\E is being escaped isn't quite right though.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
| [reply] |
Re: qr//x and \# weirdness (AS 5.8?)
by PodMaster (Abbot) on May 28, 2003 at 04:40 UTC
|
Also, does this affect non-AS builds of 5.8?
Yes. This be very very very very serious.
I couldn't test on perl-5.8.x as it doesn't currently build (same goes for bleadperl -- 5.9.x), but at least perl-5.6.x is fine (the upcoming perl-5.6.2 ;d)
update:
Well putting \x23 (aka '#') yields expected results (?x-ism:\w+ \x23 ).
I highly doubt that itappears to be trying to match a \n at that position,
but it would still be nice of the re was all in one line like we have come to expect.
update:
Check this out
#! perl -slw
use strict;
my( @for ) =
(
qr[\w+\#],
qr[\w+ \# ]x,
qr[\w+ [#] ]x,
qr[\w+ \Q#\E ]x,
qr[\w+ \x23 ]x,
);
print for @for;
my $r = "the# stringy# the# dude# foy ";
print $r;
for my $s( @for ) {
my( @m ) = $r =~ /$s/g;
warn scalar @m;
}
__END__
# on 5.8
##########################
(?-xism:\w+\#)
(?x-ism:\w+ \#
)
(?x-ism:\w+ [#]
)
(?x-ism:\w+ \#\\E\
)
(?x-ism:\w+ \x23 )
the# stringy# the# dude# foy
4 at - line 21.
4 at - line 21.
4 at - line 21.
0 at - line 21.
4 at - line 21.
# on 5.6
##########################
(?-xism:\w+\#)
(?x-ism:\w+ \# )
(?x-ism:\w+ [#] )
(?x-ism:\w+ \#\\E\ )
(?x-ism:\w+ \x23 )
the# stringy# the# dude# foy
4 at - line 21.
4 at - line 21.
4 at - line 21.
0 at - line 21.
4 at - line 21.
MJD says you
can't just make shit up and expect the computer to know what you mean, retardo!
I run a Win32 PPM
repository for perl 5.6x+5.8x. I take requests.
** The Third rule of perl club is a statement of fact: pod is sexy.
|
| [reply] [d/l] |
|
Hmm. Maybe the original problem I was trying to track down is nothing to do with this, but it is still weird.
Even your 5.6 output shows that \Q#\E doesn't work the way you (er..I) would expect. The #, \, E & the following space are all being escaped?
Thanks for the \x23 idea. That may get me around this?
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
| [reply] |
Re: qr//x and \# weirdness (AS 5.8?)
by tedrek (Pilgrim) on May 28, 2003 at 04:56 UTC
|
I got the same results on 5.8 under Linux
however I did discover this bit which didn't behave as expected
my $re_e = qr[\w+ \# f]x;
print $re_e;
__END__
(?x-ism:\w+ \# f
)
which kinda looks like the new line is at the end of the regex. *shrug* | [reply] [d/l] |
Re: qr//x and \# weirdness (AS 5.8?)
by djantzen (Priest) on May 28, 2003 at 04:43 UTC
|
It appears to be fine under 5.6.1 on Solaris, but I get the same results as you under 5.8 on Linux.
"The dead do not recognize context" -- Kai, Lexx
| [reply] |
|
|