Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: what would you like to see in perl5.12?

by bart (Canon)
on Aug 20, 2007 at 11:45 UTC ( #633776=note: print w/ replies, xml ) Need Help??

in reply to what would you like to see in perl5.12?

First of all, the things that are promised for 5.10. :) These include:
  • defined or: // and //=
  • speed improvements in regexes as promised (and implemented) by demerphq
  • recursive regexes! (ditto)

Aside from that, I'd like to see support for matching regexes across boundaries for partially loaded buffers. That would ease processing files in blocks of a few k each, instead of having to load the entire file into a string.

As an example: say you're looking for a word "SELECT" and the buffer contains:

my $sth = $dbh->prepare('SEL
It's possible that it would have matched "SELECT" if the buffer wasn't cut off.

I'd like regexes to be able to catch that. Automatically.

I don't really care how it's done, but I personally favor a system that takes some action (die, set a variable, call a callback sub) when the lookahead "touches" the back end of the buffer. (I call that the "electric fence" approach: touch it and you're dead.)

Replies are listed 'Best First'.
Re^2: what would you like to see in perl5.12?
by sgt (Deacon) on Aug 22, 2007 at 20:56 UTC

    Yes. I do agree completely. This opens the realm of stream regexps and would facilitate greatly the construction of regexp-based tokenizer (scalar m//gc) which need to process their input in chunks. Currently you need to resort to contorted hacks to do stream tokenizing, a pity as this limits the implementation of generic parser generators in pure Perl.

    What is needed is a way to keep the state of the regexp engine at the end of the buffer -- end-of-buffer-match case--, so that when you add another chunk, the engine does not start again from the beginning. Considering all the goodies added by demerphq, maybe there is hope ;) to see something soon.

    Also I'd like to be able to switch to a smaller but faster regexp implementation just for a block. Or maybe be able to turn off parts of the main engine -- locally -- that I know I am not going to use in a given block (supposing that doing so gives extra speed of course).

    cheers --stephan

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://633776]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2016-07-30 10:11 GMT
Find Nodes?
    Voting Booth?
    What is your favorite alternate name for a (specific) keyboard key?

    Results (265 votes). Check out past polls.