Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: what would you like to see in perl5.12?

by bart (Canon)
on Aug 20, 2007 at 11:45 UTC ( #633776=note: print w/replies, xml ) Need Help??

in reply to what would you like to see in perl5.12?

First of all, the things that are promised for 5.10. :) These include:
  • defined or: // and //=
  • speed improvements in regexes as promised (and implemented) by demerphq
  • recursive regexes! (ditto)

Aside from that, I'd like to see support for matching regexes across boundaries for partially loaded buffers. That would ease processing files in blocks of a few k each, instead of having to load the entire file into a string.

As an example: say you're looking for a word "SELECT" and the buffer contains:

my $sth = $dbh->prepare('SEL
It's possible that it would have matched "SELECT" if the buffer wasn't cut off.

I'd like regexes to be able to catch that. Automatically.

I don't really care how it's done, but I personally favor a system that takes some action (die, set a variable, call a callback sub) when the lookahead "touches" the back end of the buffer. (I call that the "electric fence" approach: touch it and you're dead.)

Replies are listed 'Best First'.
Re^2: what would you like to see in perl5.12?
by sgt (Deacon) on Aug 22, 2007 at 20:56 UTC

    Yes. I do agree completely. This opens the realm of stream regexps and would facilitate greatly the construction of regexp-based tokenizer (scalar m//gc) which need to process their input in chunks. Currently you need to resort to contorted hacks to do stream tokenizing, a pity as this limits the implementation of generic parser generators in pure Perl.

    What is needed is a way to keep the state of the regexp engine at the end of the buffer -- end-of-buffer-match case--, so that when you add another chunk, the engine does not start again from the beginning. Considering all the goodies added by demerphq, maybe there is hope ;) to see something soon.

    Also I'd like to be able to switch to a smaller but faster regexp implementation just for a block. Or maybe be able to turn off parts of the main engine -- locally -- that I know I am not going to use in a given block (supposing that doing so gives extra speed of course).

    cheers --stephan

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://633776]
[Corion]: Yay. Traditional finance situation averted. Bonds can be quoted in amounts (1_000_000 EUR) or per unit (1 unit). And a traditional error is to trade 2_000_000 piece when you meant to trade 2_000_000 EUR.
[Corion]: (one of my scripts simply catches high amounts and I phone people making that trade, ideally before the payment is due)
[Corion]: The sad thing is that my script sits at the end of the pipeline and can only look at the payments due today or tomorrow basically, while there are many more systems further up in the pipeline
[GotToBTru]: better late than never, I guess
[Corion]: GotToBTru: Sure - there is a long and sad story of many frantic cleanups that led us to implement this notification ;)

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (10)
As of 2017-03-29 11:32 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (347 votes). Check out past polls.