Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

regex issue (positive look-ahead)

by jeanluca (Deacon)
on Mar 10, 2010 at 15:05 UTC ( #827812=perlquestion: print w/ replies, xml ) Need Help??
jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

Here is a script which replaces the last part of an URL
#! /usr/bin/perl -l my $url = "http://example.com/user/1234/" ; $url =~ s![^/]+(?=/$)!_bla_! ; print "URL = $url" ;
this works. However when I change the regex to
s![^/]+(?=/)$!_bla_!
it doesn't!
Any suggestions why ?

cheers
LuCa

UPDATE: thnx a lot, I understand it now!!

Comment on regex issue (positive look-ahead)
Select or Download Code
Re: regex issue (positive look-ahead)
by Anonymous Monk on Mar 10, 2010 at 15:10 UTC
Re: regex issue (positive look-ahead)
by kennethk (Monsignor) on Mar 10, 2010 at 15:28 UTC
    The difference between the two regexes is the placement of your $ metacharacter, so that should be your red flag. By moving it you are changing your trailing anchor. The first version says:
    1. Match at least one character that is not a slash
    2. It should be followed by (non-consuming):
      1. a slash
      2. the end of string

    The new version says:

    1. Match at least one character that is not a slash
    2. It should be followed by (non-consuming):
      1. a slash
    3. The last consumed element should be followed by the end of string

    In your second regex, the positive look-ahead assertion and the end of string anchor directly conflict. Obligatory documentation refs: perlre, perlretut.

Re: regex issue (positive look-ahead)
by ikegami (Pope) on Mar 10, 2010 at 15:36 UTC
    zero-width positive lookahead. s![^/]+(?=/)$!_bla_! finds non-slashes, looks ahead to make sure they are followed by a slash, then checks if they are followed by a newline or end of string. That can't happen. You can't have something that's followed both by a slash and by a newline, and you can't have something that's followed both by a slash and nothing.
Re: regex issue (positive look-ahead)
by rubasov (Friar) on Mar 10, 2010 at 15:42 UTC
    In short: because the lookahead you're using is a zero-width assertion.

    In a more explanatory style: your second regex means something like this: match a url part not containing slashes, followed by a slash (but do not include this slash into to the currently matched substring), and followed by the end of the string.

    url part not containing slashes -|- followed by a slash |- followed by the end of the string
    Those followed by clauses should be fulfilled at the same time, however your url does not contain such a substring.

    update: corrected a negation error above, thanks ikegami.

Re: regex issue (positive look-ahead)
by Anonymous Monk on Mar 10, 2010 at 15:43 UTC
    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr![^/]+(?=/$)! )->explain; __END__ The regular expression: (?-imsx:[^/]+(?=/$)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- [^/]+ any character except: '/' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- / '/' ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr![^/]+(?=/)$! )->explain; __END__ The regular expression: (?-imsx:[^/]+(?=/)$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- [^/]+ any character except: '/' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- / '/' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://827812]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2014-09-18 11:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (111 votes), past polls