http://www.perlmonks.org?node_id=298971

There's been a meditation called Regexes on Streams recently that deals with the evil I've been doing in the File::Stream module and I received much valuable feedback. (Much thanks to those who offered their advice.) I suggest you have a look at the above node first because this is what has happened to the module since.

The find() method is used internally by readline(). This is where the weirdness happens.

Starting out with getting the function arguments (@terms is the set of strings/regexes/objects to incorporate into out regular expression). "use re 'eval'" is needed for the ${} regex construct which we'll be using to do action-at-a-distance when reaching the end of the current string buffer. $End_Of_String is a global that will be incremented on encountering the end of the buffer in the regex. Lexicals did not work here due to some ${} weirdness.

sub find { my $self = shift; my @terms = @_; use re 'eval'; $End_Of_String = 0;
Transforming the input strings/regexes/objects into compiled regexes first (the second map). Then, every regex is deparsed using YAPE::Regex and reconstructed as a string with '(?:\z(?{$End_Of_String++})(?!)|)' after every token. The result is then compiled.
my @regex_tokens = map { my $yp = YAPE::Regex->new($_); my $str = ''; my $token; while ($token = $yp->next()) { $str .= $token->string() . '(?:\z(?{$End_Of_String++})(?!)|)'; } qr/$str/; } map { if ( not ref($_) ) { qr/\Q$_\E/; } elsif ( ref($_) eq 'Regexp' ) { $_; } else { my $string = "$_"; qr/\Q$string\E/; } } @terms;
Some more on that weird piece of regular expression:
'(?:\z(?{$End_Of_String++})(?!)|)'
We match for \z, the end of the string. If that isn't currently the case, the | way at the end of the regex comes in and matches the empty string. Voila - effectively a no-op unless at the end of the string. If the \z matches, the code in (?{}) is executed (that is, $End_Of_String incremented).

Now we construct one final regex with capturing parens around each of the bunch of we munged above. We compile it.

my $re = '(' . join( ')|(', @regex_tokens ) . ')'; my $compiled = qr/$re/s;
We match against the buffer. If either the $End_Of_String var was set via the regex or we didn't match the string at all, the global's reset and we append more data to the buffer. Repeat until match.
Once we have a match, we determine which capturing group matched and remove everything up to after the match from the buffer. Then, the string pre-match and the match itself are returned from find().
while (1) { my @matches = $self->{buffer} =~ $compiled; if ($End_Of_String or not @matches) { $End_Of_String = 0; return undef unless $self->fill_buffer(); next; } else { my $index = undef; for ( 0 .. $#matches ) { $index = $_, last if defined $matches[$_]; } die if not defined $index; # sanity check my $match = $matches[$index]; $self->{buffer} =~ s/^(.*?)\Q$match\E//s or die; return ( $1, $match ); } } }

Can you spot any bugs? (I found one while writing the above.)

Steffen