Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Parsing Logs

by mdotpl (Initiate)
on Oct 05, 2012 at 16:30 UTC ( #997512=perlquestion: print w/replies, xml ) Need Help??
mdotpl has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, Here's where I'm at: Parsing auth log for a sessionID which will be used to extract particular request lines with the sessionID in message log. I'm having trouble with the foreach within a foreach loop. I can't figure out how to do this effectively. As I continue to tackle this, any advice / wisdom would be greatly appreciated! :)

open(AUTH) or die("couldn't open auth"); foreach $line (<AUTH>) { chomp($line); if($line =~ m/<my-regex-goes-here>/i) { push(@sessionID, $4); } } close(AUTH); open(MSG) or die("couldn't open msg log"); foreach $line (<MSG>) { chomp($line); push(@array, $line); } close(MSG); foreach $line (@array) { foreach $id (@sessionID) { # Look for auth request with sessionID if($line =~ m/<my-regex-goes-here($id)>/i) { push(@list, $line); } } } foreach $lines (@list) { print $lines; print "\n"; }

Replies are listed 'Best First'.
Re: Parsing Logs
by moritz (Cardinal) on Oct 05, 2012 at 16:59 UTC
    I'm having trouble with the foreach within a foreach loop. I can't figure out how to do this effectively.

    The trick is not to use two nested loops. Instead use the session ID as a key in a hash, and then when interating through @array, extract the session ID from each line, and check if the session ID exists in a hash. That should reduce the runtime from O(@array * @sessionID) to something more along the lines of O(@array + @sessionID).

Re: Parsing Logs
by mbethke (Hermit) on Oct 05, 2012 at 17:03 UTC

    Hi mdotpl,
    it would probably help if you included a few lines of the format you're trying to parse. So here's just some general advice:

    It hardly ever makes sense to read files in foreach(). It evaluates <> in list context, causing the whole file to be read into memory at once even if you only want to keep a small part of it. That's particularly important with log files that tend to be large.

    The other thing is that you probably want a hash instead of an array there, to save the loop that searches for a matching ID later and speed up everything by orders of magnitude. So the first part should be:

    while(<AUTH>) { chomp; if(/<my-regex-goes-here>/i) { $sessionID{$4} = 1; } }

    Also note that if the session ID is in your fourth capture and you're not using the other three (or more), it would make sense not to capture in the first place. It reads easier and is faster.

    With the hash, the rest can simply look like this (I've also renamed @list to reflect its purpose---"@list" doesn't really say anything as at-foo is always a list):

    open(MSG) or die("couldn't open msg log"); while(<MSG>) { chomp; if(my ($id) =~ /<my-regex-goes-here(<capture the id>)>/i)) { push @matching_lines, $_ if $sessionID{$id}; } } close(MSG); print $_, "\n" foreach (@matching_lines);

    If you don't need to do anything else with the lines, you could also omit the chomp (the regex will match anyway) and make the last bit a simple "print @lines"

Re: Parsing Logs
by johngg (Abbot) on Oct 05, 2012 at 18:14 UTC

    You can also save some typing where you build your @array from the MSG file handle. Your

    foreach $line (<MSG>) { chomp($line); push(@array, $line);

    could be written

    chomp( @array = <MSG> );

    It is good practice to put

    use strict; use warnings;

    at the top of your scripts to enforce a little coding discipline and catch typos. It is also good practice to use lexical file handles, the three-argument form of open and to check that it succeeded (you do this), but also give the o/s error (see $! in perlvar) to give a better idea of why it failed.

    open my $msgFH, q{<}, q{/some/file} or die qq{open: < /some/file: $!\n};

    I hope these points are helpful.



Re: Parsing Logs
by toolic (Bishop) on Oct 05, 2012 at 16:39 UTC
Re: Parsing Logs
by mdotpl (Initiate) on Oct 05, 2012 at 18:34 UTC

    Wow, thank you all for the pointers! I've got a lot to learn but you've all provided some very valuable information. That is only one part of the process I have to accomplish. I'll be working on it over the weekend and will update this post with any changes and provide a solution when I do succeed.

    Rest of the flow is as such:

    -> Grab sessionID from auth log which is in the rough format: 20120921 10:04:02.162 LOGIN_FAIL username sessionid -> With that sessionID, parse message log file for: 20120921 10:04:02.162 AUTHREQ referer sessionid -> Sometimes there will be duplicate entries in message (i.e. same sessionID, different time, potentially different referer). If there are duplicates, I want to parse the time to find the one which is closest in time to the original auth event and then grab the referer from that, eventually counting the total per referer.

Re: Parsing Logs
by grizzley (Chaplain) on Oct 08, 2012 at 08:52 UTC
    What is exactly the <my-regex-goes-here($id)>? You probably could combine all ids in one regexp and that way avoid inner foreach loop if you don't care which id matched (and even if you do care, simply use parens in regexp and $1).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://997512]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2018-01-16 14:47 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (180 votes). Check out past polls.