Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Your lines have four non-space items followed by a series of this=that pairs where that could contain spaces. I would first split on whitespace using the third argument to limit the split to five fields. I would then use a global regex match to pull out the thises and thats from the fifth field as key/value pairs to populate a hash. The regex uses a look-ahead to avoid consuming the next pair. I use Data::Dumper here to show what has been parsed from the file.

use strict; use warnings; use Data::Dumper; my $rxExtractFields = qr {(?x) \s* (\S+) = \s* (\S.*?) (?= \s*\S+= | \z ) }; open my $inFH, q{<}, \ <<'END_OF_FILE' or die qq{open: $!\n}; 2007-11-16 16:04:33 Local1.Alert 128.29.29.40 id=firewall tim +e="2007-11-16 16:04:08" fw=WS2000-Store 29 pri=1 proto=6(tcp) src=128 +.29.29.200 dst=128.29.100.102 mid= 1013 mtp= 2 msg=TCP connection re +quest received is invalid, dropping packet Src 23 Dst 4412 from EXT n +/w agent=Firewall 2007-11-16 16:05:05 Local1.Alert 128.24.24.40 id=firewall tim +e="2007-11-16 16:03:25" fw=WS2000-Store 24 pri=1 proto=6(tcp) src=128 +.24.24.200 dst=128.24.100.101 mid= 1013 mtp= 2 msg=TCP connection re +quest received is invalid, dropping packet Src 23 Dst 4344 from EXT n +/w agent=Firewall 2007-11-16 16:05:34 Local1.Alert 128.29.29.40 id=firewall tim +e="2007-11-16 16:05:09" fw=WS2000-Store 29 pri=1 proto=6(tcp) src=128 +.29.29.200 dst=128.29.100.102 mid= 1013 mtp= 2 msg=TCP connection re +quest received is invalid, dropping packet Src 23 Dst 4412 from EXT n +/w agent=Firewall 2007-11-16 16:05:39 Local1.Alert 128.2.2.40 id=firewall time= +"2007-11-16 16:03:36" fw=WS2000-Store 02 pri=1 proto=6(tcp) src=128.2 +.2.200 dst=128.2.100.106 mid= 1013 mtp= 2 msg=TCP connection request + received is invalid, dropping packet Src 23 Dst 4631 from EXT n/w ag +ent=Firewall 2007-11-16 16:05:40 Local1.Alert 128.2.2.40 id=firewall time= +"2007-11-16 16:03:36" fw=WS2000-Store 02 pri=1 proto=6(tcp) src=128.2 +.2.200 dst=128.2.100.106 mid= 1013 mtp= 2 msg=TCP connection request + received is invalid, dropping packet Src 23 Dst 4631 from EXT n/w ag +ent=Firewall 2007-11-16 16:05:40 Local1.Alert 128.2.2.40 id=firewall time= +"2007-11-16 16:03:37" fw=WS2000-Store 02 pri=1 proto=6(tcp) src=128.2 +.2.200 dst=128.2.100.106 mid= 1013 mtp= 2 msg=TCP connection request + received is invalid, dropping packet Src 23 Dst 4631 from EXT n/w ag +ent=Firewall END_OF_FILE my @parsedData = (); while ( <$inFH> ) { chomp; my ( $date, $time, $type, $ip, $restOfLine ) = split m{\s+}, $_, 5; my %pairs = $restOfLine =~ m{$rxExtractFields}g; push @parsedData, { field1 => $date, field2 => $time, field3 => $type, field4 => $ip, %pairs, }; } close $inFH or die qq{close: $!\n}; print Data::Dumper->Dumpxs( [ \ @parsedData], [ q{*parsedData} ] );

Here's the output.

@parsedData = ( { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4412 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:04:08"', 'src' => '128.29.29.200', 'field4' => '128.29.29.40', 'field2' => '16:04:33', 'field3' => 'Local1.Alert', 'mtp' => '2', 'mid' => '1013', 'fw' => 'WS2000-Store 29', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'pri' => '1', 'id' => 'firewall', 'dst' => '128.29.100.102' }, { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4344 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:03:25"', 'src' => '128.24.24.200', 'field4' => '128.24.24.40', 'field2' => '16:05:05', 'field3' => 'Local1.Alert', 'mtp' => '2', 'fw' => 'WS2000-Store 24', 'mid' => '1013', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'id' => 'firewall', 'pri' => '1', 'dst' => '128.24.100.101' }, { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4412 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:05:09"', 'src' => '128.29.29.200', 'field4' => '128.29.29.40', 'field2' => '16:05:34', 'field3' => 'Local1.Alert', 'mtp' => '2', 'fw' => 'WS2000-Store 29', 'mid' => '1013', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'id' => 'firewall', 'pri' => '1', 'dst' => '128.29.100.102' }, { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4631 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:03:36"', 'src' => '128.2.2.200', 'field4' => '128.2.2.40', 'field2' => '16:05:39', 'field3' => 'Local1.Alert', 'mtp' => '2', 'fw' => 'WS2000-Store 02', 'mid' => '1013', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'id' => 'firewall', 'pri' => '1', 'dst' => '128.2.100.106' }, { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4631 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:03:36"', 'src' => '128.2.2.200', 'field4' => '128.2.2.40', 'field2' => '16:05:40', 'field3' => 'Local1.Alert', 'mtp' => '2', 'fw' => 'WS2000-Store 02', 'mid' => '1013', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'id' => 'firewall', 'pri' => '1', 'dst' => '128.2.100.106' }, { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4631 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:03:37"', 'src' => '128.2.2.200', 'field4' => '128.2.2.40', 'field2' => '16:05:40', 'field3' => 'Local1.Alert', 'mtp' => '2', 'fw' => 'WS2000-Store 02', 'mid' => '1013', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'id' => 'firewall', 'pri' => '1', 'dst' => '128.2.100.106' } );

I hope this is of interest.

Cheers,

JohnGG


In reply to Re: Parsing a log file by johngg
in thread Parsing a log file by TStanley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (6)
    As of 2020-04-08 16:33 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      The most amusing oxymoron is:
















      Results (45 votes). Check out past polls.

      Notices?