Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Regex bug? (/u not cooperating with /x)

by mrpeabody (Friar)
on Oct 24, 2007 at 04:04 UTC ( #646816=perlquestion: print w/ replies, xml ) Need Help??
mrpeabody has asked for the wisdom of the Perl Monks concerning the following question:

This works:

522 $ perl -we'my $x = "bar"; print "yes" if "fooBarbaz" =~ /\u$x/x' yes

This doesn't:

523 $ perl -we'my $x = "bar"; print "yes" if "fooBarbaz" =~ /\u $x/x'

In other words, the \u operator can't be separated from the following atom by whitespace, even under \x. If it is, it silently fails to have an effect. The same thing happens if you use the string "bar" directly instead of $x.

Perhaps this is just an artifact of \u being more of a double-quote interpolation operator, as opposed to a regex operator proper. But it seems odd (wrong) that "\b $word" works fine while "\u $word" fails.

Is there a chance of getting this changed, or is there some good reason why it works this way?

This is perl, v5.8.8 built for cygwin-thread-multi-64int

Comment on Regex bug? (/u not cooperating with /x)
Select or Download Code
Re: Regex bug? (/u not cooperating with /x)
by duff (Vicar) on Oct 24, 2007 at 04:18 UTC

    I don't know if I'd call it a bug. \u is behaving as advertised. It's upcasing the next char (which happens to be a space) :-)

    But, I would agree to a change in spec and implementation that says it upcases the next non-whitespace char when /x is in effect. Too bad you'll have to wait for 5.12 if it's not changed in 5.10 already though as I think it's in feature freeze.

Re: Regex bug? (/u not cooperating with /x)
by runrig (Abbot) on Oct 24, 2007 at 04:19 UTC
    What's not working? "b" is not uppercase. "B" is. Try "Bar" instead of "bar" (update: wait...according to my logic it's the first one that's not working).

    Update: nevermind...I'm not paying attention.

Re: Regex bug? (/u not cooperating with /x)
by GrandFather (Cardinal) on Oct 24, 2007 at 04:23 UTC

    In perlreref in the discussion of Escape Sequences it says:

    \b An assertion, not backspace, except in a character class

    so there is no expectation that \b (an assertion) should have consistent behavior with \u (an escape sequence character).


    Perl is environmentally friendly - it saves trees
Re: Regex bug? (/u not cooperating with /x)
by ikegami (Pope) on Oct 24, 2007 at 05:05 UTC

    Spaces are not permitted in escape sequences. That includes \x20, \c[ and \ub.

    \b, if considered an escape sequence, is the escape sequence in its entirety. It can be preceded and followed by anything.

Re: Regex bug? (/u not cooperating with /x) (string vs regex)
by tye (Cardinal) on Oct 24, 2007 at 05:20 UTC

    /x only impacts how the regex is parsed. It can't impact how the string is parsed. \u impacts how the string is generated. The string has to be parsed and generated before the resulting string can be passed to the regex engine which then parses the string as a regex. And it is clear that /x can't impact the parsing of the string because the string has to be parsed before Perl can even see the /x.

    If we fix your "bug", then we should also have to fix this to work:

    "Y" =~ /\u(?x: y)/ or die "\\u and (?x: ) don't cooperate";

    So this is a bit of a restatement of other replies, perhaps, but \u isn't a regex construct, so it isn't impacted by regex flags.

    You'll also note other things that aren't impacted by /x:

    /$foo bar[5]/x; # Not the same as /$foobar[5]/ /\ u2/x; # Not the same as /\u2/, obviously /\01 2/x; # Not the same as /\012/ /$hash{a Key}/x;# Not the same as /$hash{aKey}/

    You can also tell that /x impacts regex parsing not string parsing because:

    #!/usr/bin/perl -w my $x= " b c "; print for "abcd" =~ /$x/xg; # prints "bc"

    And one other way to see that \u isn't a regex thingy:

    print "($1)($2)\n" while "fFFggGGGG" =~ /([a-z]+)\U(\1*)/g; print "---\n"; print "($1)($2)\n" while "fFFggGGGG" =~ /([a-z]+)\U(\1*g)/g; __END__ Prints: (f)() (gg)() --- (gg)(G)

    - tye        

      I see:
      print "($1)($2)\n" while "ggGGGG" =~ /([a-z])\U(\1*g)/g; # (g)(gG)

      \U applies to "\1", not to the "g" that it matches.

      So string parsing and regex parsing are two separate stages, the operators of which look similar but take effect at different times. That makes sense. I do think that a quick note in perlre would make it less surprising.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://646816]
Approved by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (14)
As of 2014-10-24 16:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (133 votes), past polls