http://www.perlmonks.org?node_id=298436

I suppose this a bit of an expert question.
perlvar states that "[...] the value of "$/" is a string, not a regex.". Which is sad. And no longer strictly true.

Doing regex matching on streams is tricky at best and in order to work really well in all cases, it would require a different regular expression engine than perl's.

So I wrote File::Stream. Not implementing a regular expression engine (insert maniac laughter here), but implementing "regexes on streams" by means of progressive buffering -> matching -> buffer expansion -> matching... (see below for some comments on inherent problems with this approach)

With the current implementation, you can already do things like this:

use File::Stream; my $stream = File::Stream->new($filehandle); $/ = qr/\s*,\s*/; print "$_\n" while <$stream>;
It can also do quite a bit more, so consider having a look at the module's synopsis, the pasting of which is considered a waste of screen space here. A few important problems, however, remain.

Most importantly, infinite regexes on streams tend to introduce infinite strings into your memory. Too bad we don't live in the ideal Turing machine world, but this can't be helped.
Furthermore, given that regexes are used on the current buffer, they may match less than they would if the next X bytes were also part of the buffer. Like the former issue, this likely cannot be fixed for good.

  • Is there a robust, pure-Perl way of inspecting regular expressions for possibly infinite constructs? Anything non-extreme but involving XS?
  • The problem that regexes might match on the current buffer contents, but would match more if it could might be halfway fixed by reading in another block from the stream and reperforming the match. Repeat until the match stays the same over n read operations. Weirdness? Heuristic? Or a fix?
  • Is it possible to achieve usage like this:
    use File::Stream::Improved; $/ = qr/regex/; my @records = <HANDLE>; # where HANDLE might also be $handle
    The significant difference to the currently working code is that $handle/HANDLE needn't be a File::Stream tied handle, but may be just any filehandle.
    Any ideas?

Steffen