Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Working with a very large log file (parsing data out)

by topher (Scribe)
on Feb 25, 2013 at 17:05 UTC ( #1020544=note: print w/ replies, xml ) Need Help??


in reply to Working with a very large log file (parsing data out)

At my previous job, I did a *lot* of log processing. As much as I love Perl, for quick and dirty ad hoc log mangling, awk was frequently my go-to tool. For cases exactly as you describe, I used the following:

cat logfile.log | awk '{count[$4]++}; END {for (x in count) {print count[x], x}};' | sort -nr

By using an associative array (hash) to track the unique values, you reduce the amount of data you have to sort by orders of magnitude (potentially).

Note: This is not a "max performance" solution. It is a "usually fast enough" solution. If you want maximum performance, there are lots of additional things you can do to make this faster. One of the easiest things (that often pays quick dividends on modern multi-core/CPU systems) is to compress your log files. This decreases the disk IO, and for many systems will be faster than reading the whole uncompressed file from disk.

zcat logfile.log.gz | awk '{count[$4]++}; END {for (x in count) {print count[x], x}};' | sort -nr

Another possible speedup would be to do a perl-equivalent of the awk, but to stop your line split at the number of fields you care about (plus 1 for "the rest"). This will frequently be faster than the awk example, but is slightly less suitable to manually typing in every time you're hitting a log file for ad hoc log queries. Although, looking at them side-by-side, it's really not much more difficult; I think it's just the hundreds of times I typed the awk version that makes it pop quickly from my fingers.

zcat logfile.log.gz | perl -ne '@line = split " ",$_, 5; $count{$line[3]}++; END {print "$count{$_} $_ \n" for (keys %count); };'


Comment on Re: Working with a very large log file (parsing data out)
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020544]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (9)
As of 2014-10-22 10:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (115 votes), past polls