http://www.perlmonks.org?node_id=854220

bharatbsharma has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks I am trying to build my know how on perl regular expression from cookbook http://docstore.mik.ua/orelly/perl/cookbook I am getting stuck at some point . How to read?
/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
~ s/man(?=\d)/cat/
What does '(?=\d)' signify here . Is there any good link which will explain all nitty gritty of regular expression from basic level? Thanks in advance Bharat

Replies are listed 'Best First'.
Re: Reading Reg Exp
by Ratazong (Monsignor) on Aug 11, 2010 at 06:34 UTC

    There are many resources on this in the tutorials-section.

    Another resource I find helpful is this online analyzer. It gives you at least a hint where to search for - e.g. in your second example it shows that the ?= is a ZeroWidthPositiveLookahead. Google it to gain deeper knowledge ;-)

    HTH, Rata
Re: Reading Reg Exp
by kejohm (Hermit) on Aug 11, 2010 at 07:59 UTC

    You could also try YAPE::Regex::Explain. Thanks to Toolic for suggesting this module in a previous post. Here is an example using one of your regexes:

    #!perl use strict; use warnings; use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i) +->explain(); __END__ The regular expression: (?i-msx:(?:\w+\s+fish\s+){2}(\w+)\s+fish) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?i-msx: group, but do not capture (case-insensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture (2 times): ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- fish 'fish' ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ){2} end of grouping ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- fish 'fish' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    Update: Link fixed.

      \s+                      whitespace (\n, \r, \t, \f, and " ")
      That's actually incorrect. \s matches 25 different characters, although locale (and EBCDIC) can change the set of characters matched. Even in the LATIN-1 range, next line ("\x85") and no-break space ("\xA0") will be matched by \s if either the pattern or subject has the UTF-8 flag set.

        YAPE::Regex::Explain is probably only set up for the most common uses; since it uses YAPE::Regex to parse the regex, it probably can't detect encoding or locale. Since it is only providing an explanation of the regex, in most cases it wouldn't really matter.

Re: Reading Reg Exp
by planetscape (Chancellor) on Aug 11, 2010 at 09:29 UTC
Re: Reading Reg Exp
by suhailck (Friar) on Aug 11, 2010 at 06:29 UTC
Re: Reading Reg Exp
by dasgar (Priest) on Aug 11, 2010 at 13:16 UTC

    If you're wanting to learn about regular expressions, I'd recommend checking out Mastering Regular Expressions. The author not only explains regular expressions, but also discusses how regular expression engines work and how that impacts the regular expression syntax. It's a good reference book to have handy.

    As for your question on (?=\d), that's a zero width look ahead. Basically your second regular expression is looking for the first occurrence of the word 'man' that is immediately followed by a digit and then replaces 'man' with 'cat'. That's assuming that you didn't accidentally forget to include modifiers after the last / in your post.

    For example, if your string was man mantis man1 man2 man15, the second regular expression that you provided would change that to man mantis cat1 man2 man15.

Re: Reading Reg Exp
by biohisham (Priest) on Aug 11, 2010 at 08:04 UTC
    I find YAPE::Regex::Explain quite interesting ...


    Excellence is an Endeavor of Persistence. A Year-Old Monk :D .
Re: Reading Reg Exp
by ww (Archbishop) on Aug 11, 2010 at 13:40 UTC
    Downvoted for use of pirate-site URL in OP.