Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Regular Expressions

by imp (Priest)
on Nov 27, 2006 at 14:08 UTC ( #586257=note: print w/ replies, xml ) Need Help??


in reply to Regular Expressions

YAPE::Regex::Explain is very helpful when debugging a regular expression. You can use it like this:

use strict; use warnings; use YAPE::Regex::Explain; my $regexp = qr/^(.*?)((=<)|[<=>])(.*)/; my $exp = YAPE::Regex::Explain->new($regexp); print $exp->explain;
The output is as follows:
NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- =< '=<' ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- [<=>] any character of: '<', '=', '>' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ( group and capture to \4: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \4 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Another useful tool is to use the 'x' modifier to allow whitespace in the regex. I consider regex to be an extremely dense programming language, and without the whitespace to organize your thoughts it is very easy to get lost in the noise.

Here is your regex, using the 'x' modifier:

my $re = qr{ ^(.*?) ( (=<) | [<=>] ) (.*) }x;
When writing a large regex it is a tradeoff between accuracy and readability. It is sometimes tempting to keep it simple so the regex is maintainable. 'x' is useful for addressing this problem, as you can put comments in the regex. Here is a revised regex for you:
my $re = qr{ ^ # Beginning of line \s* # Optional whitespace ([a-zA-Z0-9_]+) # Capture(1) Alphanumeric LHS \s* # Optional whitespace ( # Capture(2) either: [<>!]= # <=, >=, != | [<>=] # <, >, = ) \s* # Optional whitespace ([a-zA-Z0-9_]+) # Capture(3) Alphanumeric RHS }x;
And if you would like to make it more readable you can separate some of the tokens into other variables, like this:
my $operand = '[a-zA-Z0-9_]+' ; my $re = qr{ \A # Beginning of line \s* # Optional whitespace ($operand) # Capture(1) Alphanumeric LHS \s* # Optional whitespace ( # Capture(2) either: [<>!]= # <=, >=, != | [<>=] # <, >, = ) \s* # Optional whitespace ($operand) # Capture(3) Alphanumeric LHS }x;
I noticed that your example input allowed '=>' instead of '>=', maybe in your locale that is allowed?

Here is a functional test script for you. It matches the items documented in the regex, but does not match '=>' or '=<' (Is that allowed in your locale?)

use strict; use warnings; my $operand = '[a-zA-Z0-9_]+' ; my $re = qr{ ^ # Beginning of string \s* # Optional whitespace ($operand) # Capture(1) Alphanumeric LHS \s* # Optional whitespace ( # Capture(2) either: [<>!]= # <=, >=, != | [<>=] # <, >, = ) \s* # Optional whitespace ($operand) # Capture(1) Alphanumeric LHS }x; while (my $line = <DATA>) { my ($lhs,$operator,$rhs) = $line =~ $re; if ($line =~ $re) { my ($lhs,$operator,$rhs) = ($1,$2,$3); printf " (%4s) (%2s) (%4s)\n", $lhs, $operator, $rhs; } } __DATA__ a=b a!=b a<b a>b a=>b a=<b


Comment on Re: Regular Expressions
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://586257]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2014-09-20 20:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (163 votes), past polls