Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

how to make a regex loop efficient?

by mavili (Initiate)
on Oct 16, 2012 at 03:23 UTC ( #999209=perlquestion: print w/ replies, xml ) Need Help??
mavili has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, i've got the code below and it takes ages (>40 seconds) to run. I'm looking for ways to make it more efficient if possible.

The array @files is around 450 elements, and the file LOG is a large file containing tens of thousands of lines, which is then read into the string $log (this is faster than iterating through the lines of LOG):

local $/ = undef; open(LOG, 'log_file') or die("couldn't open file"); binmode LOG; $log = <LOG>; foreach $file (@files){ $file_counts{$file}[0] =()= $log =~ /<regexp1>/gi; $file_counts{$file}[1] =()= $log =~ /<regexp2>/gi; }

Can anyone suggest some improvements please?

EDIT: problem solved. I was doing it completely wrong by putting the log file into a string. the solution was to go through the logs in a loop HOWEVER in a different way. solutin is more or less outlined below:

while(<LOG>){ if(/GET\s.*\/(\S+)\.pdf\s/i and exists $file_counts{$1}){ $file_counts{$1}[0]++; } }

LOG file is an HTTP access log and the regex above matches HTTP requests for .pdf file types. So the following will have been a successful match and would be incremented in the hash.

GET /folder1/folder2/file.pdf HTTP ..etc

Comment on how to make a regex loop efficient?
Select or Download Code
Re: how to make a regex loop efficient?
by Anonymous Monk on Oct 16, 2012 at 03:44 UTC

    Can anyone suggest some improvements please?

    The answer is no, not without seing the regex

      thanks for your reply. I found the problem and edited the post. cheers
Re: how to make a regex loop efficient?
by BrowserUk (Pope) on Oct 16, 2012 at 03:49 UTC

    If you disclose what regexp1 & regexp2 look like -- and how they are derived from $file? Presumably they are not constants, otherwise you would be calculating the same two counts 450 times each -- then it may be possible to see some way of speeding up the processing.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://999209]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2015-07-07 08:51 GMT
Find Nodes?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...

    Results (88 votes), past polls