Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Find Prefix if regex didn't match

by Anonymous Monk
on Oct 31, 2012 at 11:20 UTC ( #1001652=note: print w/ replies, xml ) Need Help??

Comment on Re: Find Prefix if regex didn't match
Re^2: Find Prefix if regex didn't match
by demoralizer (Acolyte) on Oct 31, 2012 at 12:06 UTC
    Thanks for your fast answer!

    some good aproaches but no hit...

    pos() doesn't work because if the string only contains a prefix of the given expression (what will be the hot case I'm looking for) I will get undef and not the position I need or do I overlook sth. here?

    My "window" can become as large as it likes, doesn't matter, but I have to ensure that no matching will be overseen, that's quite important! It's just to make the searching faster and if I know that the first n characters can be thrown away because they can never be part of a matching would do the job. Any alternative suggestion?

      pos() doesn't work because if the string only contains a prefix of the given expression (what will be the hot case I'm looking for) I will get undef and not the position I need or do I overlook sth. here?

      Once again in english, please?

      Any alternative suggestion?

      Not really. To remove a prefix requires a regex match. And then you do real matching. I doubt there is any savings to be had by matching twice ... or actually cutting the string, even with pos

      my $search="AB.*Z"; my $string="WWWA"; my $search_prefix = $1 if $search =~ /^(\w+)/g; warn $search_prefix; my $prefix_offset = index ( $string, $search_prefix ); substr $string, 0, $prefix_offset , ''; warn $string; $string = "WWWADBBBABC"; $prefix_offset = index ( $string, $search_prefix );; warn $string; substr $string, 0, $prefix_offset , ''; warn $string; $string = "WWWADBBBABC"; pos( $string ) = $prefix_offset; warn pos( $string ); ## next match m//g starts at offset __END__ AB at jank line 5. A at jank line 8. WWWADBBBABC at jank line 11. ABC at jank line 13. 8 at jank line 16.

      For some idea why I think so , maybe , see Why does global match run faster than none global?, Multiple Regex evaluations or one big one?

        I do not believe you are gaining anything by the strategy you suggest in your problem.

        The only part of the text you can safely throw away is any part which does not match any leading "fixed" characters in the regex, less the length of the "fixed" character string.

        For example, looking for AB.*Z will only be able to (eat) throw away text until it encounters the first AB in the text, as from then on greedy matching means it must acquire all text until it encounters a Z, so even if the next Z is several million characters from the AB, the program must keep all of it, and run the search from that AB.

        In summary, if you are finding searches slow, then you should perhaps be looking at doing the search less often, perhaps as a scheduled task or when the text grows by a set amount.

        aah now I got your idea, not bad but that's too much simplified ;)

        Extracting a \w+ prefix from the expression e.g. doesn't work with stuff like this:
        my $search="(AB)+.*Z";

        The problem is that the search string is given and therefore I have no influence on it. Maybe you have been irritated by my ".*ABC" example but what I ment here was that in such a case there is no unmatchable prefix that can be cut away.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1001652]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2014-12-29 11:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (187 votes), past polls