how to make a regex loop efficient?

by mavili (Initiate)
on Oct 16, 2012 at 03:23 UTC
mavili has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, i've got the code below and it takes ages (>40 seconds) to run. I'm looking for ways to make it more efficient if possible.

The array @files is around 450 elements, and the file LOG is a large file containing tens of thousands of lines, which is then read into the string $log (this is faster than iterating through the lines of LOG):

local $/ = undef; open(LOG, 'log_file') or die("couldn't open file"); binmode LOG; $log = <LOG>; foreach $file (@files){ $file_counts{$file}[0] =()= $log =~ /<regexp1>/gi; $file_counts{$file}[1] =()= $log =~ /<regexp2>/gi; }

Can anyone suggest some improvements please?

EDIT: problem solved. I was doing it completely wrong by putting the log file into a string. the solution was to go through the logs in a loop HOWEVER in a different way. solutin is more or less outlined below:

while(<LOG>){ if(/GET\s.*\/(\S+)\.pdf\s/i and exists $file_counts{$1}){ $file_counts{$1}[0]++; } }

LOG file is an HTTP access log and the regex above matches HTTP requests for .pdf file types. So the following will have been a successful match and would be incremented in the hash.

GET /folder1/folder2/file.pdf HTTP ..etc

Re: how to make a regex loop efficient?
by Anonymous Monk on Oct 16, 2012 at 03:44 UTC

    Can anyone suggest some improvements please?

    The answer is no, not without seing the regex

      thanks for your reply. I found the problem and edited the post. cheers
Re: how to make a regex loop efficient?
by BrowserUk (Pope) on Oct 16, 2012 at 03:49 UTC

    If you disclose what regexp1 & regexp2 look like -- and how they are derived from $file? Presumably they are not constants, otherwise you would be calculating the same two counts 450 times each -- then it may be possible to see some way of speeding up the processing.

