Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Help with timeout

by Marshall (Abbot)
on Sep 25, 2012 at 19:37 UTC ( #995625=note: print w/replies, xml ) Need Help??

in reply to Help with timeout

I think kcott has a fine post on how to use alarm(). However, it would seem to me that a better solution would be to figure out why this thing is so darn slow and fix that so that you get a result all of the time without having to "give up".

I looked at the first couple of regexes (see below). When you are using the /x modifier, you can space this out on multiple lines and this can improve the readability a lot. You can also add comments to the lines, but there are some limitations about what can go in the #comment (see perlre doc) for more details and you cannot put a space inside of a 2 char token like the ?: in (?: ..the non-capture..), but this #comment stuff can be useful.

I see some strange things (there appear to be terms that have no purpose). Also $data (maybe a 10MB) is slurped into memory as a single variable and many regex'es are applied serially to this humongous thing. Parsing, re-parsing, re-parsing and re-parsing something big is often not a good idea performance wise.

Often, parsing something very large is best done line by line and ONLY once. Read a line, deal with it, throw it away because we are done....

I suspect that if you shared some more details about the file format and why one of these things is 10MB?, far more efficient algorithms could be devised. Your regex'es appear to do very similar things. A single pass that figures everything out on "one go" would be faster. Could even be that algorithms that just stop reading the file, once we've got what we need are appropriate?

While I was playing with this, I spaced your regex'es out (that is what the /x allows). Also show how to use the Regex::Explain function - which is sometimes useful.

So some alternate way to space out the regex'es to increase readability are shown below.

I do suspect that the "real solution" is to make this so fast that there is never any need for a 2 minute timeout! But there are some things about your application that I and others just don't understand. It would be most helpful if you could clarify further!

#!/usr/bin/perl -w use strict; use YAPE::Regex::Explain; #prototype from the docs... #my $exp = YAPE::Regex::Explain->new($REx)->explain; my $REx1 = qr{m/(?:Item|ITEM)[Ss]?\s? (?:\.|\-|:|\-\-|\,)?\s? (?:1|I)\s? (?:\.|\-|\:|\-\-|\,)?\s? (?:Description|DESCRIPTION)?\s? (?:[Oo][Ff])?\s?(?:[Tt][Hh][Ee])?\s? (?:Busine\s?ss|BUSINE\s?SS|Company|COMPANY)\s? (?:\.|\-|:|\-\-|\,|\()? (.*?)\s? (?:Item|ITEM)[Ss]?\s? (?:\.|\:|\-\-|\-|\,)?\s? (?:I|1A|1B|2)\s? (?:\.|\:|\-\-|\-|\,)?/x}; #this term apparently # not needed not captured and it is optional my $Rex2 = qr{m/(?:Business\s?Development|BUSINESS\s?DEVELOPMENT)\s? (.*?)\s? (?:Item|ITEM)[Ss]?\s? (?:\.|\-|:|\-\-|\,)?\s? (?:I|1A|1B|2)\s? (?:\.|\:|\-\-|\-|\,)?/x}; my $Rex3 = qr{m/(?:PART|Part)\s? (?:\.|\-|\:|\-\-|\,)?\s? (?:I|1)\s? (?:\.|\-|:|\-\-|\,)?\s? (?:BUSINESS|Business|GENERAL|general) (.*?)\s? (?:Item|ITEM)[Ss]?\s? (?:I|1A|1B|2|3)\s? (?:\:|\-|\,|\-\-|\.|\,)?/x}; print YAPE::Regex::Explain->new($REx1)->explain;
Update: here is an example of an inefficiency:
its gonna search the whole 10MB to figure out that this match does not exist. I suspect that there is a far faster way to do this job? Maybe that's not possible, but I doubt that. I think you should be asking the Monks how to make your algorithm run so darn fast that this 2 minute time out is irrelevant. I would not be surprised if the total time to get all results is 5-10x faster but without knowing more I certainly can't guarantee that but if I was in Vegas, I would put some money down on that proposition. But you have to explain more - not enough information is known.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://995625]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2018-03-18 22:27 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (231 votes). Check out past polls.