Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

log file split into HoA

by vboy1997 (Initiate)
on May 15, 2011 at 05:30 UTC ( #904917=perlquestion: print w/ replies, xml ) Need Help??
vboy1997 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have a huge mail log file with each entry looking like this,

Dec 8 08:49:21 b.mx.sonic.net sm-mta18242: jB8GnCuK018242: from=<cj@oreilly.com>, size=10731, class="0", nrcpts=2, msgid=<E4461FEB-1B74-4612-80DC-3A39B04D89B1@oreilly.com>, proto=ESMTP, daemon=MTA, relay=mwest.oreilly.com 209.204.146.24
Dec 8 08:49:21 b.mx.sonic.net sm-mta18528: jB8GnCuK018242: to=<mkirk@corp.sonic.net>,<dane@corp.sonic.net>, delay=00:00:00, xdelay=00:00:00, mailer=esmtp, pri=160731, relay=lds.sonic.net. 208.201.249.231, dsn=2.0.0, stat=Sent (jB8GnLpC004736 Message accepted for delivery)

I want to split each entry into 4 parts and store them inside a HoA. The 4 parts are:

$header = everything before the $id
$id = jB8GnCuK018242
$to_from = from=<cj@oreilly.com> or to=<mkirk@corp.sonic.net>,<dane@corp.sonic.net>
$footer = everything after the $to_from section
Any help or pointers on how I can do this would be greatly appreciated. Thank you all.

Comment on log file split into HoA
Re: log file split into HoA
by John M. Dlugosz (Monsignor) on May 15, 2011 at 05:38 UTC
    Try using a regular expression with 4 captures. Your first task is to look carefully at the lines and figure out how to separate the parts. (Hint: ':' characters?).
Re: log file split into HoA
by Anonymous Monk on May 15, 2011 at 06:47 UTC
Re: log file split into HoA
by LanX (Canon) on May 15, 2011 at 11:23 UTC
    use strict; use warnings; use Data::Dump qw"pp"; use English; my %HoA; my @AoA; while (<DATA>) { if (my ($id,$fromto)=/: (\w+): ((?:from|to)=<.*?>),/ ){ $HoA{$id}=[$PREMATCH,$fromto,$POSTMATCH]; @AoA[$.]=[$PREMATCH,$id,$fromto,$POSTMATCH]; }else { warn "Couldn't parse <<<$_>>> at line $."; } } pp(\%HoA); pp(\@AoA); __DATA__ Dec 8 08:49:21 b.mx.sonic.net sm-mta18242: jB8GnCuK018242: from=<cj@or +eilly.com>, size=10731, class="0", nrcpts=2, msgid=<E4461FEB-1B74-461 +2-80DC-3A39B04D89B1@oreilly.com>, proto=ESMTP, daemon=MTA, relay=mwes +t.oreilly.com 209.204.146.24 Dec 8 08:49:21 b.mx.sonic.net sm-mta18528: jB8GnCuK018242: to=<mkirk@c +orp.sonic.net>,<dane@corp.sonic.net>, delay=00:00:00, xdelay=00:00:00 +, mailer=esmtp, pri=160731, relay=lds.sonic.net. 208.201.249.231, dsn +=2.0.0, stat=Sent (jB8GnLpC004736 Message accepted for delivery)

    both data-lines have the same ID so maybe you don't want a HoA but a AoA

    OUTPUT:

    { jB8GnCuK018242 => [ "Dec 8 08:49:21 b.mx.sonic.net sm-mta18528", "to=<mkirk\@corp.sonic.net>", "<dane\@corp.sonic.net>, delay=00:00:00, xdelay=00:00:00, mail +er=esmtp, pri=160731, relay=lds.sonic.net. 208.201.249.231, dsn=2.0.0 +, stat=Sent (jB8GnLpC004736 Message accepted for delivery) \n", ], } [ undef, [ "Dec 8 08:49:21 b.mx.sonic.net sm-mta18242", "jB8GnCuK018242", "from=<cj\@oreilly.com>", " size=10731, class=\"0\", nrcpts=2, msgid=<E4461FEB-1B74-4612-80D +C-3A39B04D89B1\@oreilly.com>, proto=ESMTP, daemon=MTA, relay=mwest.or +eilly.com 209.204.146.24\n", ], [ "Dec 8 08:49:21 b.mx.sonic.net sm-mta18528", "jB8GnCuK018242", "to=<mkirk\@corp.sonic.net>", "<dane\@corp.sonic.net>, delay=00:00:00, xdelay=00:00:00, mailer=e +smtp, pri=160731, relay=lds.sonic.net. 208.201.249.231, dsn=2.0.0, st +at=Sent (jB8GnLpC004736 Message accepted for delivery) \n", ], ]

    Cheers Rolf

    UPDATE: better add a chomp after the while ...

    UPDATE: adding a space to the regex helps catching additional mailadresses   my ($id,$fromto)=/: (\w+): ((?:from|to)=<.*?>), /;

    but be aware that you have to be sure about the format of your logs, otherwise you need a parser for mailadresses which isn't trivial

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://904917]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2014-08-23 18:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (176 votes), past polls