Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

how to make a regex loop efficient?

by mavili (Initiate)
on Oct 16, 2012 at 03:23 UTC ( #999209=perlquestion: print w/replies, xml ) Need Help??
mavili has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, i've got the code below and it takes ages (>40 seconds) to run. I'm looking for ways to make it more efficient if possible.

The array @files is around 450 elements, and the file LOG is a large file containing tens of thousands of lines, which is then read into the string $log (this is faster than iterating through the lines of LOG):

local $/ = undef; open(LOG, 'log_file') or die("couldn't open file"); binmode LOG; $log = <LOG>; foreach $file (@files){ $file_counts{$file}[0] =()= $log =~ /<regexp1>/gi; $file_counts{$file}[1] =()= $log =~ /<regexp2>/gi; }

Can anyone suggest some improvements please?

EDIT: problem solved. I was doing it completely wrong by putting the log file into a string. the solution was to go through the logs in a loop HOWEVER in a different way. solutin is more or less outlined below:

while(<LOG>){ if(/GET\s.*\/(\S+)\.pdf\s/i and exists $file_counts{$1}){ $file_counts{$1}[0]++; } }

LOG file is an HTTP access log and the regex above matches HTTP requests for .pdf file types. So the following will have been a successful match and would be incremented in the hash.

GET /folder1/folder2/file.pdf HTTP ..etc

Replies are listed 'Best First'.
Re: how to make a regex loop efficient?
by BrowserUk (Pope) on Oct 16, 2012 at 03:49 UTC

    If you disclose what regexp1 & regexp2 look like -- and how they are derived from $file? Presumably they are not constants, otherwise you would be calculating the same two counts 450 times each -- then it may be possible to see some way of speeding up the processing.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Re: how to make a regex loop efficient?
by Anonymous Monk on Oct 16, 2012 at 03:44 UTC

    Can anyone suggest some improvements please?

    The answer is no, not without seing the regex

      thanks for your reply. I found the problem and edited the post. cheers

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://999209]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2018-05-25 17:52 GMT
Find Nodes?
    Voting Booth?