Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Find Prefix if regex didn't match

by demoralizer (Acolyte)
on Oct 31, 2012 at 11:16 UTC ( #1001651=perlquestion: print w/ replies, xml ) Need Help??
demoralizer has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks,

I'm "using" perlmonks since years and most of the time I've found good answers for my questions but now I fear I have a quite hard one what made it necessary to create an account and join you :)

Here it is:
I have a given regular expression and a GROWING string to be searched through. All the time the string enlarges I will do a search if the given expression can be machted now. If the string enlarges sign wise (and I don't know when it will become larger next time) and if it is quite large the searching becomes quite slow!

To speed this up the solution I have in mind is cutting the head of the string if it is impossible that the regular expression will be found there. This sounds easy because the string can be cut at that start searching position where the regexp engine reached the end of the string the first time. But I have no idea how I can get this information.

Here a little example:
$search="AB.*Z"
$string="WWWA" -> we can cat WWW

now string enlarges...
$string="WWWADBBBABC" -> we can cut WWWADBBB

and so on...

Stupid expressions like $search=".*ABC" are quite harder to handle but that's not so important and therefore I will ignore such special cases, just keep it simple what means if the regexp engine reaches end of string while matching use the position where it started search in that case to cut the string.

Any ideas?

Comment on Find Prefix if regex didn't match
Re: Find Prefix if regex didn't match
by Anonymous Monk on Oct 31, 2012 at 11:20 UTC
      Thanks for your fast answer!

      some good aproaches but no hit...

      pos() doesn't work because if the string only contains a prefix of the given expression (what will be the hot case I'm looking for) I will get undef and not the position I need or do I overlook sth. here?

      My "window" can become as large as it likes, doesn't matter, but I have to ensure that no matching will be overseen, that's quite important! It's just to make the searching faster and if I know that the first n characters can be thrown away because they can never be part of a matching would do the job. Any alternative suggestion?

        pos() doesn't work because if the string only contains a prefix of the given expression (what will be the hot case I'm looking for) I will get undef and not the position I need or do I overlook sth. here?

        Once again in english, please?

        Any alternative suggestion?

        Not really. To remove a prefix requires a regex match. And then you do real matching. I doubt there is any savings to be had by matching twice ... or actually cutting the string, even with pos

        my $search="AB.*Z"; my $string="WWWA"; my $search_prefix = $1 if $search =~ /^(\w+)/g; warn $search_prefix; my $prefix_offset = index ( $string, $search_prefix ); substr $string, 0, $prefix_offset , ''; warn $string; $string = "WWWADBBBABC"; $prefix_offset = index ( $string, $search_prefix );; warn $string; substr $string, 0, $prefix_offset , ''; warn $string; $string = "WWWADBBBABC"; pos( $string ) = $prefix_offset; warn pos( $string ); ## next match m//g starts at offset __END__ AB at jank line 5. A at jank line 8. WWWADBBBABC at jank line 11. ABC at jank line 13. 8 at jank line 16.

        For some idea why I think so , maybe , see Why does global match run faster than none global?, Multiple Regex evaluations or one big one?

Re: Find Prefix if regex didn't match
by Anonymous Monk on Oct 31, 2012 at 13:27 UTC
      I try to match only once... but to become faster I try to cut unmatchable stuff at the beginning of my search string so that the next try needn't to start from the beginning again.

      At the moment there are some doubts if cutting really makes things better but that is what I see here.
Re: Find Prefix if regex didn't match
by greengaroo (Hermit) on Oct 31, 2012 at 13:34 UTC

    Hello demoralizer and welcome to PerlMonks. May I ask you to post some (working) code here? Just put what you already came up with, even if it is slow, at least we will get a better idea of the context, then we can figure a way to optimize it. Thanks!

    There are no stupid questions, but there are a lot of inquisitive idiots.
      Hi greengaroo,

      show working code will be little bit hard because I'm listening to sockets... but here is sth. that should at least show what I'm trying to do:
      # $term is the socket # $text contains all received and unscanned text # $scanned contains all scanned text my $rec; my $time = gettimeofday(); while(1) { # sth. to read? if (read($term, $rec, 0xFFFF)) { # collect what have been read $text .= $rec; # expression found? if ($text =~ s/(.*)($expect)//s) { $scanned .= $1; $scanned .= "MATCH"; $scanned .= $2; return 0; } # shorten string for speed up elsif (length($text) >= 20) { $scanned .= "CUTTED"; $scanned .= substr($text, 0, length($text) - 20 + 1); $text = substr($text, length($text) - 20 + 1); } } # timeout? if (gettimeofday() - $time > $timeout) { $scanned .= "TIMEOUT"; return 1; } }
      It seems that the "elsif (length($text) >= 20)" makes things faster but doesn't do exactly what I want because in this way I can lose possible matchings :(

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1001651]
Approved by Corion
Front-paged by MidLifeXis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (9)
As of 2014-12-27 16:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls