http://www.perlmonks.org?node_id=586257


in reply to Regular Expressions

YAPE::Regex::Explain is very helpful when debugging a regular expression. You can use it like this:
use strict; use warnings; use YAPE::Regex::Explain; my $regexp = qr/^(.*?)((=<)|[<=>])(.*)/; my $exp = YAPE::Regex::Explain->new($regexp); print $exp->explain;
The output is as follows:
NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- =< '=<' ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- [<=>] any character of: '<', '=', '>' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ( group and capture to \4: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \4 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Another useful tool is to use the 'x' modifier to allow whitespace in the regex. I consider regex to be an extremely dense programming language, and without the whitespace to organize your thoughts it is very easy to get lost in the noise.

Here is your regex, using the 'x' modifier:

my $re = qr{ ^(.*?) ( (=<) | [<=>] ) (.*) }x;
When writing a large regex it is a tradeoff between accuracy and readability. It is sometimes tempting to keep it simple so the regex is maintainable. 'x' is useful for addressing this problem, as you can put comments in the regex. Here is a revised regex for you:
my $re = qr{ ^ # Beginning of line \s* # Optional whitespace ([a-zA-Z0-9_]+) # Capture(1) Alphanumeric LHS \s* # Optional whitespace ( # Capture(2) either: [<>!]= # <=, >=, != | [<>=] # <, >, = ) \s* # Optional whitespace ([a-zA-Z0-9_]+) # Capture(3) Alphanumeric RHS }x;
And if you would like to make it more readable you can separate some of the tokens into other variables, like this:
my $operand = '[a-zA-Z0-9_]+' ; my $re = qr{ \A # Beginning of line \s* # Optional whitespace ($operand) # Capture(1) Alphanumeric LHS \s* # Optional whitespace ( # Capture(2) either: [<>!]= # <=, >=, != | [<>=] # <, >, = ) \s* # Optional whitespace ($operand) # Capture(3) Alphanumeric LHS }x;
I noticed that your example input allowed '=>' instead of '>=', maybe in your locale that is allowed?

Here is a functional test script for you. It matches the items documented in the regex, but does not match '=>' or '=<' (Is that allowed in your locale?)

use strict; use warnings; my $operand = '[a-zA-Z0-9_]+' ; my $re = qr{ ^ # Beginning of string \s* # Optional whitespace ($operand) # Capture(1) Alphanumeric LHS \s* # Optional whitespace ( # Capture(2) either: [<>!]= # <=, >=, != | [<>=] # <, >, = ) \s* # Optional whitespace ($operand) # Capture(1) Alphanumeric LHS }x; while (my $line = <DATA>) { my ($lhs,$operator,$rhs) = $line =~ $re; if ($line =~ $re) { my ($lhs,$operator,$rhs) = ($1,$2,$3); printf " (%4s) (%2s) (%4s)\n", $lhs, $operator, $rhs; } } __DATA__ a=b a!=b a<b a>b a=>b a=<b