Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

qr//x and \# weirdness (AS 5.8?)

by BrowserUk (Patriarch)
on May 28, 2003 at 04:17 UTC ( [id://261206]=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

I've discovered a peculiarity, which may be a bug but I'm not sure yet. It affects AS 802 (5.8.0), but not AS 633.

If you try to use an escaped # in a regex using qr//x, it embeds a newline in the compiled regex. As you can see, I tried various methods of escaping the # to no avail.

#! perl -slw use strict; my $re_a = qr[\w+\#]; my $re_b = qr[\w+ \# ]x; my $re_c = qr[\w+ [#] ]x; my $re_d = qr[\w+ \Q#\E ]x; print $re_a; print $re_b; print $re_c; print $re_d; __END__ (?-xism:\w+\#) (?x-ism:\w+ \# ) (?x-ism:\w+ [#] ) (?x-ism:\w+ \#\\E\ )

I think this is a bug, as it appears to be trying to match a \n at that position, but I'm having trouble confirming this. Any ideas how to verify this is the case?

Also, does this affect non-AS builds of 5.8?


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Replies are listed 'Best First'.
Re: qr//x and \# weirdness (AS 5.8?)
by Chmrr (Vicar) on May 28, 2003 at 05:28 UTC

    The problem is around line 3000 in sv.c; the comment there reads:

    /* * If /x was used, we have to worry about a regex * ending with a comment later being embedded * within another regex. If so, we don't want this * regex's "commentization" to leak out to the * right part of the enclosing regex, we must cap * it with a newline. * * So, if /x was used, we scan backwards from the * end of the regex. If we find a '#' before we * find a newline, we need to add a newline * ourself. If we find a '\n' first (or if we * don't find '#' or '\n'), we don't need to add * anything. -jfriedl */

    This is only a display bug -- it doesn't effect the way the regex matches.

    It looks like it's not looking earlier than the # to see if the # is preceeded by a \. Unfortunately, I don't think it's just that easy, either, as qr[ \\# ] should get the newline..

    perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'

      I don't have bleedperl, but something like this (simulated!) diff might work? (I scrunched it a bit).

      @ 2992 sv.c (5.8.0) if (PMf_EXTENDED & re->reganch) { char *endptr = re->precomp + re->prelen; while (endptr >= re->precomp) { char c = *(endptr--); if (c == '\n') break; /* don't need another */ if (c == '#') { + int n =0; + while( endptr >= re->precomp && + c = *(endptr--) && + c == '\' ) n++; + /* if we've an odd number of backslashes the # + is escaped, so don't need the newline */ + + if ( n & 1) break; /* we end while in a comment, so we need a newline */ mg->mg_len++; /* save space for it */ need_newline = 1; /* note to add it */ } } }

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: qr//x and \# weirdness (AS 5.8?)
by PodMaster (Abbot) on May 28, 2003 at 04:40 UTC
    Also, does this affect non-AS builds of 5.8?
    Yes. This be very very very very serious.

    I couldn't test on perl-5.8.x as it doesn't currently build (same goes for bleadperl -- 5.9.x), but at least perl-5.6.x is fine (the upcoming perl-5.6.2 ;d)

    update: Well putting \x23 (aka '#') yields expected results (?x-ism:\w+ \x23 ).

    I highly doubt that itappears to be trying to match a \n at that position, but it would still be nice of the re was all in one line like we have come to expect.

    update: Check this out

    #! perl -slw use strict; my( @for ) = ( qr[\w+\#], qr[\w+ \# ]x, qr[\w+ [#] ]x, qr[\w+ \Q#\E ]x, qr[\w+ \x23 ]x, ); print for @for; my $r = "the# stringy# the# dude# foy "; print $r; for my $s( @for ) { my( @m ) = $r =~ /$s/g; warn scalar @m; } __END__ # on 5.8 ########################## (?-xism:\w+\#) (?x-ism:\w+ \# ) (?x-ism:\w+ [#] ) (?x-ism:\w+ \#\\E\ ) (?x-ism:\w+ \x23 ) the# stringy# the# dude# foy 4 at - line 21. 4 at - line 21. 4 at - line 21. 0 at - line 21. 4 at - line 21. # on 5.6 ########################## (?-xism:\w+\#) (?x-ism:\w+ \# ) (?x-ism:\w+ [#] ) (?x-ism:\w+ \#\\E\ ) (?x-ism:\w+ \x23 ) the# stringy# the# dude# foy 4 at - line 21. 4 at - line 21. 4 at - line 21. 0 at - line 21. 4 at - line 21.


    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests.
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      Hmm. Maybe the original problem I was trying to track down is nothing to do with this, but it is still weird.

      Even your 5.6 output shows that \Q#\E doesn't work the way you (er..I) would expect. The #, \, E & the following space are all being escaped?

      Thanks for the \x23 idea. That may get me around this?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: qr//x and \# weirdness (AS 5.8?)
by Enlil (Parson) on May 28, 2003 at 04:59 UTC
    as it appears to be trying to match a \n at that position

    Well here is what it does appear to match on AS Perl 5.8 and on Perl 5.8 built from source on RH Linux(with a little help of use re 'debug'):

    #! perl -slw use strict; #use re 'debug'; my $re_b = qr[\w+ #]x; my $re_c = qr[\w+ [#] ]x; my $re_d = qr[\w+ \Q#\E ]x; print $re_b if "foo" =~ $re_b; #matches /\w/ print $re_c if "foo#" =~ $re_c; #matches /\w+#/ print $re_d if "fldkdafds#\\E " =~ $re_d; #matches /\w+#\\E /;

    -enlil

      Okay:) So the problem with my regex (the bigger original one where I discovered this) not matching is probably not to do with this peculiarity, and I was just grabbing a straw. But it was a pretty good straw:)

      That the \E in \Q#\E is being escaped isn't quite right though.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: qr//x and \# weirdness (AS 5.8?)
by tedrek (Pilgrim) on May 28, 2003 at 04:56 UTC
    I got the same results on 5.8 under Linux
    however I did discover this bit which didn't behave as expected
    my $re_e = qr[\w+ \# f]x; print $re_e; __END__ (?x-ism:\w+ \# f )
    which kinda looks like the new line is at the end of the regex. *shrug*
Re: qr//x and \# weirdness (AS 5.8?)
by djantzen (Priest) on May 28, 2003 at 04:43 UTC

    It appears to be fine under 5.6.1 on Solaris, but I get the same results as you under 5.8 on Linux.


    "The dead do not recognize context" -- Kai, Lexx

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://261206]
Approved by blokhead
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-12-07 20:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which IDE have you been most impressed by?













    Results (50 votes). Check out past polls.