Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Find Prefix if regex didn't match

by demoralizer (Sexton)
on Oct 31, 2012 at 11:16 UTC ( #1001651=perlquestion: print w/replies, xml ) Need Help??
demoralizer has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks,

I'm "using" perlmonks since years and most of the time I've found good answers for my questions but now I fear I have a quite hard one what made it necessary to create an account and join you :)

Here it is:
I have a given regular expression and a GROWING string to be searched through. All the time the string enlarges I will do a search if the given expression can be machted now. If the string enlarges sign wise (and I don't know when it will become larger next time) and if it is quite large the searching becomes quite slow!

To speed this up the solution I have in mind is cutting the head of the string if it is impossible that the regular expression will be found there. This sounds easy because the string can be cut at that start searching position where the regexp engine reached the end of the string the first time. But I have no idea how I can get this information.

Here a little example:
$string="WWWA" -> we can cat WWW

now string enlarges...
$string="WWWADBBBABC" -> we can cut WWWADBBB

and so on...

Stupid expressions like $search=".*ABC" are quite harder to handle but that's not so important and therefore I will ignore such special cases, just keep it simple what means if the regexp engine reaches end of string while matching use the position where it started search in that case to cut the string.

Any ideas?

Replies are listed 'Best First'.
Re: Find Prefix if regex didn't match
by Anonymous Monk on Oct 31, 2012 at 11:20 UTC
      Thanks for your fast answer!

      some good aproaches but no hit...

      pos() doesn't work because if the string only contains a prefix of the given expression (what will be the hot case I'm looking for) I will get undef and not the position I need or do I overlook sth. here?

      My "window" can become as large as it likes, doesn't matter, but I have to ensure that no matching will be overseen, that's quite important! It's just to make the searching faster and if I know that the first n characters can be thrown away because they can never be part of a matching would do the job. Any alternative suggestion?

        pos() doesn't work because if the string only contains a prefix of the given expression (what will be the hot case I'm looking for) I will get undef and not the position I need or do I overlook sth. here?

        Once again in english, please?

        Any alternative suggestion?

        Not really. To remove a prefix requires a regex match. And then you do real matching. I doubt there is any savings to be had by matching twice ... or actually cutting the string, even with pos

        my $search="AB.*Z"; my $string="WWWA"; my $search_prefix = $1 if $search =~ /^(\w+)/g; warn $search_prefix; my $prefix_offset = index ( $string, $search_prefix ); substr $string, 0, $prefix_offset , ''; warn $string; $string = "WWWADBBBABC"; $prefix_offset = index ( $string, $search_prefix );; warn $string; substr $string, 0, $prefix_offset , ''; warn $string; $string = "WWWADBBBABC"; pos( $string ) = $prefix_offset; warn pos( $string ); ## next match m//g starts at offset __END__ AB at jank line 5. A at jank line 8. WWWADBBBABC at jank line 11. ABC at jank line 13. 8 at jank line 16.

        For some idea why I think so , maybe , see Why does global match run faster than none global?, Multiple Regex evaluations or one big one?

Re: Find Prefix if regex didn't match
by Anonymous Monk on Oct 31, 2012 at 13:27 UTC
      I try to match only once... but to become faster I try to cut unmatchable stuff at the beginning of my search string so that the next try needn't to start from the beginning again.

      At the moment there are some doubts if cutting really makes things better but that is what I see here.
Re: Find Prefix if regex didn't match
by greengaroo (Hermit) on Oct 31, 2012 at 13:34 UTC

    Hello demoralizer and welcome to PerlMonks. May I ask you to post some (working) code here? Just put what you already came up with, even if it is slow, at least we will get a better idea of the context, then we can figure a way to optimize it. Thanks!

    There are no stupid questions, but there are a lot of inquisitive idiots.
      Hi greengaroo,

      show working code will be little bit hard because I'm listening to sockets... but here is sth. that should at least show what I'm trying to do:
      # $term is the socket # $text contains all received and unscanned text # $scanned contains all scanned text my $rec; my $time = gettimeofday(); while(1) { # sth. to read? if (read($term, $rec, 0xFFFF)) { # collect what have been read $text .= $rec; # expression found? if ($text =~ s/(.*)($expect)//s) { $scanned .= $1; $scanned .= "MATCH"; $scanned .= $2; return 0; } # shorten string for speed up elsif (length($text) >= 20) { $scanned .= "CUTTED"; $scanned .= substr($text, 0, length($text) - 20 + 1); $text = substr($text, length($text) - 20 + 1); } } # timeout? if (gettimeofday() - $time > $timeout) { $scanned .= "TIMEOUT"; return 1; } }
      It seems that the "elsif (length($text) >= 20)" makes things faster but doesn't do exactly what I want because in this way I can lose possible matchings :(

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1001651]
Approved by Corion
Front-paged by MidLifeXis
[Corion]: marto: Heh ;) You're just further enabling their lazyness :-D
[marto]: they still don't know there was a problem, nobody told them! :P
[Corion]: marto: Ow! I would assume there is a cron job monitoring the free disk space and automatically opening a ticket at 90%, 95% and 100% usage...
[Corion]: Even we had automatic emails back when we maintained the machine ourselves...
[marto]: Corion you under estimate how lazy these admins are :P
[Discipulus]: we too; using opsview alarms
[marto]: the key word: outsourcing ;)
[Corion]: marto: Yeah, feels like that ;) You could set up the cronjob that auto-creates tickets :-))
[marto]: the ticketing system does not accept calls via email, nor has it a working API. It's tied into Active Directory for authentication and the Solaris boxes aren't on that domain
[Corion]: The one thing I haven't figured out a solution to is how to get an edge-trigger instead of sending an email every 5 minutes if the usage is above 90%. I want one mail when it goes over 90% but no more emails as long as it stays between 90% and 95%.

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2017-01-24 10:08 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (203 votes). Check out past polls.