I'm pushing the limits of Perl's regexes, and I've come across an ugliness. I'm trying to write a simple parser that produces a tree structure that represents the data being parsed. (Specifically, parsing eBay search strings into a logic tree.) It appears that the "postponed regular expression" assertion,
(??{ CODE }), does not play well with capturing groups. Observe:
# prints 'j'
"japhy" =~ m{ (.) (?{ print $1 }) }x;
# prints nothing (undef, specifically)
$rx = qr{ (.) (??{ print $1 }) }x;
"japhy" =~ m{ (??{ $rx }) }x;
I know it's "experimental", but if this doesn't work now, it probably hasn't worked ever, which means nothing's been done about it, and I'm sure it's been reported as a bug before. The work-around I'm employing is shown in my code below. The code I'm showing is a proof-of-concept that
$^R can be used in conjunction with
(??{ ... }), although I'm sure I'm not the first person to attempt this.
use Data::Dumper;
$Data::Dumper::Indent = 1;
use strict;
sub ebay_search_logic {
my $str = shift;
my ($word, $neg, $alt);
$word = qr{ (?{ save_pos() }) (\w+) (?{ push_word() }) }x;
$neg = qr{ - (??{ $word }) (?{ mod_neg() }) }x;
$alt = qr{ \( (??{ $word }) (?{ alt1(); }) (?: , (??{ $word }) (?{ a
+lt2() }) )+ \) }x;
return $str =~ m{
(?{ [] })
^ \s*
(?: (??{ $word }) | (??{ $neg }) | (??{ $alt }) )
(?: \s+ (?: (??{ $word }) | (??{ $neg }) | (??{ $alt }) ) )*
\s* $
(?{ print Dumper($^R); $^R; })
}x;
return $str;
}
print ebay_search_logic("this that those"), "\n"; # LIKE 'this' AND
+ LIKE 'that' AND LIKE 'those'
print ebay_search_logic("this -that those"), "\n"; # LIKE 'this' AND
+ (NOT LIKE 'that') AND LIKE 'those'
print ebay_search_logic("this (that,those)"), "\n"; # LIKE 'this' AND
+ (LIKE 'that' OR LIKE 'those')
sub save_pos {
my @r = @{ $^R };
[ @r, $+[0] ];
}
sub push_word {
my @r = @{ $^R };
my $p = pop @r;
my $w = substr($_, $p, $+[0] - $p);
[ @r, { WORD => $w } ];
}
sub mod_neg {
my @r = @{ $^R };
my $w = pop @r;
[ @r, { NOT => $w->{WORD} } ];
}
sub alt1 {
my @r = @{ $^R };
my $w = pop @r;
[ @r, { ALT => [ $w->{WORD} ] } ];
}
sub alt2 {
my @r = @{ $^R };
my $w = pop @r;
my $alt = pop @r;
[ @r, { ALT => [ @{ $alt->{ALT} }, $w->{WORD} ] } ];
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.