Hi
I'm using the regex engine to identify delimited fields matching certain conditions.
Thanks to Perl's internal trie optimization of OR-conditions¹ it's far faster than using LIKE in mysql especially with hundreds of patterns to check
$text = 'A0 peter Z0 ... A42 peter, paul and mary Z42 ... A99 mary Z9
+9';
my @or_matches = ( $text =~ m/A(\d+)[^Z]*(peter|mary)[^Z]*Z/g );
print "@or_matches \n";
__END__
0 peter 42 mary 99 mary
But now I got the requirement to find fields which match multiple regex at the same time ... and AFAIK the regex grammar doesn't have an AND operator
The best guess I have is using zero-look-ahead assertions:
$text = 'A0 peter Z0 ... A42 peter, paul and mary Z42 ... A99 mary Z9
+9';
my @and_matches =( $text =~
m/
A(\d+)[^Z]*
(
(?=mary)
[^Z]*
peter
|
(?=peter)
[^Z]*
mary
)
[^Z]*Z
/xg );
print "@and_matches \n";
__END__
42 peter, paul and mary
Well, already rather complicated for just two patterns ... and I doubt that it's fast ... any better suggestions?
UPDATE:
Ok the following is already much better since it avoids or-chaining all possible orders of patterns just by anchoring the look-ahead at field-start.
print @and_matches =( $text =~
m/
A(\d+)
(
(?=
[^Z]*
mary
)
(?=
[^Z]*
peter
)
[^Z]*
)
Z\1
/xg );
Footnotes:
¹) >5.10 IIRC
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.