http://www.perlmonks.org?node_id=556770

sunadmn has asked for the wisdom of the Perl Monks concerning the following question:

Hey all I have what should be simple I hope to do. Below I have a bit of code that looks at a news servers stats file and adds some data up and inserts into a DB. This works fine as is but I have come to the conclusion that my data manipulation needs to be a bit fancier. The problem is this script is ran out of cron every four hours (six times daily), but when we reach the 23:59:59 hour the data could be 100% today 0% tomorrow or the exact opposite. With that said I was thinking I could use some sort of hash to take the unixtime stamp inside the file and store all of todays data in today and all of tomorrow in tomorrow. The problem is how do I store that data in that fashion.
Below is a sample set of the data I am manipulating.
1149621887 1149623161 stormchaser 69.0.0.0 69.0.0.0 +NG004193@net.NET 3 820 156 17570249 0 0 0 + 0 0 0 176 1149621900 1149623197 stormchaser 69.0.0.0 69.0.0.0 +NG004193@net.NET 1 648 80 00 0 0 0 + 0 0 93 1149622376 1149624912 stormchaser 69.0.0.0 69.0.0.0 +NG004193@net.NET 3 7932259 243 85227944 0 0 0 + 0 0 0 268 1149625677 1149625693 stormchaser 213.0.0.0 213.0.0.0 + Sinkbad@net.net 1 49385 0 00 0 0 0 + 0 0 1
Now in the above both the first and second line elements are unixtime stamps the first the start time for the query and the second the stop time for the query. I want to take the second and convert that to the actual date and store the total sum for that date all together (rather make a single DB insert for the given date and if there are two date two inserts one with the first dates total data and the second with the other dates total data). I have no idea of where to start to get where I need to be so I am here asking for the help of my fellow monks. If anyone has any suggestions on this matter please share with a lost monk.
my $sum =0; my $firststart; my $laststop; #my $unixtime; foreach my $line(@dataSUM) { my @lineData = split(/\t/, $line); my $startime = $lineData[0] if(&is_numeric($lineData[0])) || + die "$lineData[0] not numeric\n"; my $stoptime = $lineData[1] if(&is_numeric($lineData[1])) || + die "$lineData[1] not numeric\n"; # $unixtime = scalar localtime $lineData[1] if(&is_numeric( +$lineData[1])) || die "$lineData[1] not numeric\n"; my $xover_bytes = $lineData[7] if(&is_numeric($lineData[7])) || + die "$lineData[7] not numeric\n"; my $art_bytes = $lineData[9] if(&is_numeric($lineData[9])) || + die "$lineData[9] not numeric\n"; my $list_bytes = $lineData[11] if(&is_numeric($lineData[11])) || + die "$lineData[11] not numeric\n"; my $newnews_bytes = $lineData[13] if(&is_numeric($lineData[13])) || + die "$lineData[13] not numeric\n"; my $lineSum = $xover_bytes + $art_bytes + $list_bytes + $newn +ews_bytes; $sum += $lineSum; $firststart = ($startime < $firststart) || $firststart ==0 ? $star +time : $firststart; $laststop = ($stoptime > $laststop ) ? $stoptime : $laststop; } if($DEBUG) { print "This is our sum $sum\n our start $start\n our stop $stop\n"; } sub is_numeric { my $data = shift; return($data =~ m/^\d+$/); } sleep $MySQLrandom; openlog('statsdetailed', 'pid', 'user'); my $usagedbhconnect = DBI->connect("DBI:mysql:host=$dbhost2;database=$ +usagedb", $usagedbuser, $usagedbpass) || die syslog('alert', '%s', "MySQL con +nect to $dbhost2 failed data_source: $DBI::errstr") && print LOG "Connection to $usagedb at +$dbhost2 failed at $extension\n" if($LOG); if($usagedbhconnect && $LOG) { print LOG "Connection to $usagedb was a success\n"; } my $usagesqlQuery = "INSERT INTO usageSum (id,host,reseller,dat +e,startTime,StopTime,totalSum) values (?,?,?,?,?,?,?)"; my $usagesthloc = $usagedbhconnect->prepare($usagesqlQuery); $usagesthloc->execute("NULL","$usageHost","gl_design","NULL","$firstst +art","$laststop","$sum") || die syslog('alert', '%s', "MySQL execut +e failed command: $DBI::errstr") && print LOG "usage DB insertion failed at +$extension VALUES(NULL,$usageHost, \"gl_design\",$firststart,$la +ststop,$sum)\n" if($LOG); if($usagesthloc && $LOG) { print LOG "usage DB insertion was a success\n"; } $usagesthloc->finish; $usagedbhconnect->disconnect;
SUNADMN
USE PERL

Replies are listed 'Best First'.
Re: storing data in a hash
by GrandFather (Saint) on Jun 21, 2006 at 21:54 UTC

    I suspect what you are looking for is a hash of arrays. The following code may get you headed in the right direction:

    use strict; use warnings; my %dailyRecords; while (<DATA>) { chomp; next if ! length; my @record = split; # Pull out the data fields # Pull out Date and generate a date stamp my @now = reverse ((localtime ($record[1]))[3..5]); my $dateStamp = join '', map {sprintf '%04d', $_} @now; # Append this record to list keyed by date stamp push @{$dailyRecords{$dateStamp}}, [@record]; } for (sort keys %dailyRecords) { print "Day stamp: $_\n"; my @daysRecords = @{$dailyRecords{$_}}; print "@$_\n" for @daysRecords; } __DATA__ 1149621887 1149623161 NG004193@net.NET 3 820 156 1 +7570249 1149621900 1149623197 NG004193@net.NET 1 648 80 0 +0 1149692376 1149694912 NG004193@net.NET 3 7932259 243 8 +5227944 1149695677 1149695693 Sinkbad@net.net 1 49385 0 0 +0

    Prints:

    Day stamp: 010600050007 1149621887 1149623161 NG004193@net.NET 3 820 156 17570249 1149621900 1149623197 NG004193@net.NET 1 648 80 00 Day stamp: 010600050008 1149692376 1149694912 NG004193@net.NET 3 7932259 243 85227944 1149695677 1149695693 Sinkbad@net.net 1 49385 0 00

    DWIM is Perl's answer to Gödel
Re: storing data in a hash
by shmem (Chancellor) on Jun 22, 2006 at 16:04 UTC
    From what you write it is not very clear to me what you want to do.

    Does the log file span a day change?

    If so, is it, that given queries reflected in the logfile as for the following graph

    day 1 | day 2 ---- | -------- | -------| ----|-- -----|-- ----|--- --|---- |------ | -------
    you want to divide the data amount between the days and sum them up foreach date?

    Or is the the logfile always either one day or the other?

    In either case, use Time::Local. When does the script run? I assume the script runs roughly at any time between 20:30:00 and 03:00:00. Make a UNIX timestamp of the day change with

    my @now = localtime; my $hour = $now[2]; $now[$_] = 0 for 0..2; # set hour, min, sec to 0; $daychange = timelocal(@now); # add one days seconds if running before midnight $daychange += 86400 if $hour > 2;
    Then you can compare the timestamp in the second field of your rows to that timestamp and sum it to the right day sum:
    $sum{ $daystamp <= $stoptime ? 'first' : 'second' } += $linesum;
    Voila, you have the sum of one day in $sum{'first'} and of the next day in $sum{'second'}, now you can insert into your database whichever of the two contains data. If the log file spans days, you could split the amount of bytes according to the second before and after a days shift foreach row.

    cheers,
    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      shmen
      That doesn't seem to work at all as I have made a test file that has different dates in it and I still only get one or the other when testing to see if the $sum{'first'} or $sum{'second'} have content. Let me try to exaplain the problem again.

      OK so the data file has data that can be either from today or tomorrow in it. Data reads top down line for line, so lines 0-10 would be for today and then let's say lines 11-21 are for tomorrow. Now we want to take all the lines that belong to today and sum them up and insert the date (today's date from the UNIX time stamp in the file) and sum into our DB then insert the same for tomorrow. The time stamp fields represent a "RUN" period for the given line, meaning the time the query started and the time it stoped. We want to look at the stop time and convert the UNIX time stamp to scalar context so we have a human readible date. Now that we have that we want to take all Jun 22 data sum it all together and insert it into the DB and if we see Jun 23 we do the same and insert that data in a new row. Does this make sense?

      SUNADMN
      USE PERL
        Thinking it hard, no. How can the file have dates of tomorrow? Isn't it a log? Isn't it rather yesterday and today?

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        Oops, serveral typos in the code I gave you. Always use -w ! :-)
        #!/usr/bin/perl -w use Time::Local; my @now = localtime; my $hour = $now[2]; $now[$_] = 0 for 0..2; # set hour, min, sec to 0; $daychange = timelocal(@now); my @dataSUM = <DATA>; my $sum =0; my $firststart; my $laststop; #my $unixtime; foreach my $line(@dataSUM) { my @lineData = split(/[ \t]+/, $line); my $startime = $lineData[0] if(&is_numeric($lineData[0])) || + die "$lineData[0] not numeric\n"; my $stoptime = $lineData[1] if(&is_numeric($lineData[1])) || + die "$lineData[1] not numeric\n"; # $unixtime = scalar localtime $lineData[1] if(&is_numeric( +$lineData[1])) || die "$lineData[1] not numeric\n"; my $xover_bytes = $lineData[7] if(&is_numeric($lineData[7])) || + die "$lineData[7] not numeric\n"; my $art_bytes = $lineData[9] if(&is_numeric($lineData[9])) || + die "$lineData[9] not numeric\n"; my $list_bytes = $lineData[11] if(&is_numeric($lineData[11])) || + die "$lineData[11] not numeric\n"; my $newnews_bytes = $lineData[13] if(&is_numeric($lineData[13])) || + die "$lineData[13] not numeric\n"; my $lineSum = $xover_bytes + $art_bytes + $list_bytes + $newn +ews_bytes; # $sum += $lineSum; $sum{ ($daychange >= $stoptime ? 'first' : 'second') } += $lineSum; $firststart = ($startime < $firststart) || $firststart ==0 ? $star +time : $firststart; $laststop = ($stoptime > $laststop ) ? $stoptime : $laststop; } map { print "$_ => $sum{$_}\n" } keys %sum; sub is_numeric { my $data = shift; return($data =~ m/^\d+$/); } __DATA__ 1149621887 1150922348 stormchaser 69.0.0.0 69.0.0.0 NG004193 +@net.NET 3 820 156 1757 0 0 0 0 + 0 0 176 1149621900 1150922348 stormchaser 69.0.0.0 69.0.0.0 NG004193 +@net.NET 1 648 80 00 0 0 0 0 + 0 93 1149622376 1150922348 stormchaser 69.0.0.0 69.0.0.0 NG004193 +@net.NET 3 793 243 852 0 0 0 0 + 0 0 268 1149625677 1150922348 stormchaser 213.0.0.0 213.0.0.0 Sinkbad@ +net.net 1 493 0 00 0 0 0 0 + 0 1 1149621887 1151008726 stormchaser 69.0.0.0 69.0.0.0 NG004193 +@net.NET 3 820 156 17570249 0 0 0 0 + 0 0 176 1149621900 1151008726 stormchaser 69.0.0.0 69.0.0.0 NG004193 +@net.NET 1 648 80 00 0 0 0 0 + 0 93 1149622376 1151008726 stormchaser 69.0.0.0 69.0.0.0 NG004193 +@net.NET 3 7932259 243 85227944 0 0 0 0 + 0 0 268
        Yields:
        first => 5363 second => 110731920

        cheers,
        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}