Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re^4: Find Prefix if regex didn't match

by space_monk (Chaplain)
on Oct 31, 2012 at 13:27 UTC ( #1001661=note: print w/replies, xml ) Need Help??

in reply to Re^3: Find Prefix if regex didn't match
in thread Find Prefix if regex didn't match

I do not believe you are gaining anything by the strategy you suggest in your problem.

The only part of the text you can safely throw away is any part which does not match any leading "fixed" characters in the regex, less the length of the "fixed" character string.

For example, looking for AB.*Z will only be able to (eat) throw away text until it encounters the first AB in the text, as from then on greedy matching means it must acquire all text until it encounters a Z, so even if the next Z is several million characters from the AB, the program must keep all of it, and run the search from that AB.

In summary, if you are finding searches slow, then you should perhaps be looking at doing the search less often, perhaps as a scheduled task or when the text grows by a set amount.

  • Comment on Re^4: Find Prefix if regex didn't match

Replies are listed 'Best First'.
Re^5: Find Prefix if regex didn't match
by demoralizer (Sexton) on Nov 06, 2012 at 14:01 UTC
    Exactly this is what I'm trying to do, "throwing away text until it encounters the first AB...".

    Reason is that I have a timeout up to when the searched string should be found otherwise it's an error. Reaction time should be as short as possible so I have to scan as often as I receive sth. therefore doing the search less often will not work (see the short example I posted I do the search only as often as absolutely necessary but not more) and searching after the text has been grown by a certain amount of characters is also not practicable because it can happen that I receive a short package containing the expression but the necessary receive size has not been reached yet. So another timeout would be necessary for such cases what enlarges the reaction time more than necessary.

    In most cases the search string I'm looking for is only contained once and most of the time the scanned text even doesn't contain a prefix of it but it's possible that it takes two TCP packages to receive the text (e.g. I'm looking for "ABC" and receive "XXA" in the first and "BCYYY" in the second package) and it's necessary that I don't miss any pattern so the only optimization possibility I see is cutting the "head" away. Given benchmark shows me that this works! That is because the whole text can become quite large and can be received within several packages, isn't it`?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1001661]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2018-06-21 20:42 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (119 votes). Check out past polls.