I'm pushing the limits of Perl's regexes, and I've come across an ugliness. I'm trying to write a simple parser that produces a tree structure that represents the data being parsed. (Specifically, parsing eBay search strings into a logic tree.) It appears that the "postponed regular expression" assertion,
(??{ CODE }), does not play well with capturing groups. Observe:
# prints 'j'
"japhy" =~ m{ (.) (?{ print $1 }) }x;
# prints nothing (undef, specifically)
$rx = qr{ (.) (??{ print $1 }) }x;
"japhy" =~ m{ (??{ $rx }) }x;
I know it's "experimental", but if this doesn't work now, it probably hasn't worked ever, which means nothing's been done about it, and I'm sure it's been reported as a bug before. The work-around I'm employing is shown in my code below. The code I'm showing is a proof-of-concept that
$^R can be used in conjunction with
(??{ ... }), although I'm sure I'm not the first person to attempt this.
use Data::Dumper;
$Data::Dumper::Indent = 1;
use strict;
sub ebay_search_logic {
my $str = shift;
my ($word, $neg, $alt);
$word = qr{ (?{ save_pos() }) (\w+) (?{ push_word() }) }x;
$neg = qr{ - (??{ $word }) (?{ mod_neg() }) }x;
$alt = qr{ \( (??{ $word }) (?{ alt1(); }) (?: , (??{ $word }) (?{ a
+lt2() }) )+ \) }x;
return $str =~ m{
(?{ [] })
^ \s*
(?: (??{ $word }) | (??{ $neg }) | (??{ $alt }) )
(?: \s+ (?: (??{ $word }) | (??{ $neg }) | (??{ $alt }) ) )*
\s* $
(?{ print Dumper($^R); $^R; })
}x;
return $str;
}
print ebay_search_logic("this that those"), "\n"; # LIKE 'this' AND
+ LIKE 'that' AND LIKE 'those'
print ebay_search_logic("this -that those"), "\n"; # LIKE 'this' AND
+ (NOT LIKE 'that') AND LIKE 'those'
print ebay_search_logic("this (that,those)"), "\n"; # LIKE 'this' AND
+ (LIKE 'that' OR LIKE 'those')
sub save_pos {
my @r = @{ $^R };
[ @r, $+[0] ];
}
sub push_word {
my @r = @{ $^R };
my $p = pop @r;
my $w = substr($_, $p, $+[0] - $p);
[ @r, { WORD => $w } ];
}
sub mod_neg {
my @r = @{ $^R };
my $w = pop @r;
[ @r, { NOT => $w->{WORD} } ];
}
sub alt1 {
my @r = @{ $^R };
my $w = pop @r;
[ @r, { ALT => [ $w->{WORD} ] } ];
}
sub alt2 {
my @r = @{ $^R };
my $w = pop @r;
my $alt = pop @r;
[ @r, { ALT => [ @{ $alt->{ALT} }, $w->{WORD} ] } ];
}