Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I'm not sure what you were planning with the matrices: if you want to work further with this data, or move it into a database, you're probably best off pulling it into a hash, or an array-of-hashes.

If the file is very large, or memory is limited, you may have to read the file line by line, as others have suggested, insert each completed record into the database and then use that to perform whatever analyses made you want to put them there in the first place.

If you're more interested in a quick scan - how much klez this week? - then a AoH will be more fun. You should probably still use a cursor to read the file, though. it might be more dashing to do an enormous split on -+, but not wise. especially if you reset $/ to do it. really wouldn't do that. a little too sweeping.

If there was a unique identifier with each record, then a HoH would be more useful: a big hash in which the keys come from your unique field and each value is another hash containing the foo=bar pairs you've extracted. The main advantage would be that you share a key with the original file, allowing (for example) incremental updates of the database.

but there doesn't seem to be a useful hook like that, unless perhaps the events are rare enough that you don't mind assuming the timestamp for each entry is unique. So everything would go in an array instead, and the array index could serve as a makeshift id. you could still use the dates to act on only part of the file, or just invoke your script from logrotate.

I'll assume that you're putting everything in a database first and then working with it later. this is pretty hasty, but tested and i've tried to keep it readable:

#!/usr/bin/perl use strict; use DBI; use Data::Dumper; # decide which bits of the records you want to keep my @fields_to_store = qw(date name to file action virus); # turn that into a hash with which to screen regex matches my %field_ok = map { $_ => 1 } @fields_to_store; # and two strings for the database insert statement: one of column # names, one with the proper number of placeholders. my $field_list = join(',', @fields_to_store); my $placeholders = join(',', ('?' x scalar(@fields_to_store))); # connect to the database my $dsn = "DBI:mysql:database=xxxx;host=localhost"; my $dbh = DBI->connect($dsn, 'xxxx', 'xxxx', { 'RaiseError' => 1 }); # build the instruction that will be used to insert each record my $insert_handle = $dbh->prepare("insert into xxxx ($field_list) valu +es ($placeholders)"); # read the file. this %gather basket is crude, but effective # enough, so i offer it in the spirit of tmtowtdi my %gather; while(<DATA>) { # match data line? if (m/^(\w+):\s*(.+?)\s*$/ && $field_ok{lc $1}) { die "overwriting $1 field: broken" if exists $gather{lc $1}; $gather{lc $1} = $2; } # match dividing line? if (m/^-+\s*$/ && keys %gather) { # field order matters, of course, so use the fields_to_store array # in a map{} to order the contents of %gather, which would # otherwise be jumbled $insert_handle->execute( map { $gather{lc $_} } @fields_to_sto +re ); print Dumper \%gather; %gather = (); } } $insert_handle->finish; __DATA__ ---------------------------------- Date: 06/30/2002 00:01:21 From: pminich@foo.com To: esquared@foofoo.com File: value.scr Action: The uncleanable file is deleted. Virus: WORM_KLEZ.H ---------------------------------- Date: 06/30/2002 00:01:21 From: mef@mememe.com To: inet@microsoft.com File: Nr.pif Action: The uncleanable file is deleted. Virus: WORM_KLEZ.H ----------------------------------

For your database to be of much use you'd really need to split the email field and store that in a separate table, with another table in between that and the main one to hold the links between log entries and addresses. By that stage it would already be worth looking for something like Class::DBI to do the drudgery for you.


In reply to Re: virus log parser by thpfft
in thread virus log parser by phaedo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-23 18:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found