Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: Pattern matching

by kevbot (Priest)
on Nov 11, 2018 at 00:06 UTC ( #1225551=note: print w/replies, xml ) Need Help??


in reply to Re^2: Pattern matching
in thread Pattern matching

Hi nursyza,

I see that parv already provided you with an explanation of the regex pattern for you. I wanted to let you know that you can use the YAPE::Regex::Explain module to provide an explanation of any regular expression pattern. Once you have the package installed you can do something like this at the command line to get the explanation for your pattern

perl -MYAPE::Regex::Explain -E 'say YAPE::Regex::Explain->new("\b (MOD +ULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)]")->explain'
Which give the following output:
The regular expression: (?-imsx: (MODULE s+ [A-Z]+[0-9]+) s* [(] .+? [)]) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- MODULE 'MODULE ' ---------------------------------------------------------------------- s+ 's' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- [A-Z]+ any character of: 'A' to 'Z' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- s* 's' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- [(] any character of: '(' ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- .+? any character except \n (1 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- [)] any character of: ')' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

You may also want to look at perlre to get more familiar with regular expressions.

UPDATE: As parv, soonix, and AnomalousMonk pointed out (in the replies to this node), the above usage of YAPE::Regex::Explain is not correct. Passing the regex as a double-quoted string caused problems.

The following code gives the correct output

#!/usr/bin/env perl use strict; use warnings; use YAPE::Regex::Explain; my $re = qr/ \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] /x; my $exp = YAPE::Regex::Explain->new($re)->explain; print $exp; exit;
Here is the output
The regular expression: (?x-ims: \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] ) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- MODULE 'MODULE' ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [A-Z]+ any character of: 'A' to 'Z' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [(] any character of: '(' ---------------------------------------------------------------------- .+? any character except \n (1 or more times (matching the least amount possible)) ---------------------------------------------------------------------- [)] any character of: ')' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

Replies are listed 'Best First'.
Re^4: Pattern matching
by parv (Vicar) on Nov 11, 2018 at 01:53 UTC

    The output of your Y::R::E is much different than the one provided by AnomalousMonk. Yours is missing word boundary (\b) & space characters (\s). Is that due to problem with copy-paste or your version of Y::R::E module?

      Most probably due to passing the regex as string instead of using qr. Besides: the /x flag is missing, too, for the same reason.
        Most probably due to passing the regex as string instead of using qr.

        Most certainly due to this. If warnings had been enabled, some Unrecognized escape \s passed through ... messages would have been seen. (And \b is a backspace IIRC.) The problem could also have been avoided by defining the regex as a single-quoted string, but I would not recommend this because I seem to remember that using single-quoted strings in this way still has some corner-case pitfalls. I agree: use of  qr// is best here.


        Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1225551]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2019-05-23 19:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you enjoy 3D movies?



    Results (146 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!