Splitting up a file in hourly increments

hallikpapa has asked for the wisdom of the Perl Monks concerning the following question:

I have this script that tails a file and makes db inserts based on the parsing it does on that file. There is one file generated per day, but I would like to Continue this same process, but actually write the raw data to a file that is split up into hourly chunks. What is a good way to tell the system that when the script starts, open a file labeled (currentTime).curr, and then at the end of the hour, rename it to just (previousHour) and open a new file (currentTime).curr, and so on... This is what I am doing so far. db_record handles the inserts, so I figure right before there I should handle the open and close of files as well as the writing.

if ($connected) {
        
        $timestamp = time;
        if ( $timestamp < $midnight ) {
            log_notice("Client : Restarting $0\n");
            log_error("End Processing $src_cdr_file\n");
            exec '/home/$0'
              || log_warn("Client : Could not exec $0\n");
            exit 6;    # Something is wrong if this exit is taken
        }    # End if $timestamp

        log_notice("Client : Normal Termination\n");
        log_error("End Processing $src_cdr_file\n\n");
        exec '/usr/bin/perl', '/home/$0'
          || log_warn("Client : Could not exec $0\n");
        exit 7;    # Something is wrong if this exit is taken

    }    # End if $connected
};    # End anonymous sub

print $socket "tail\n";    # Rock n Roll
open( STDOUT, "> /dev/null" );

#    Begin processing records
while ( defined( my $line = <$socket> ) ) {
    if ($recover_mode) { $rec_num++; }
    db_record( $sth, $line, $recover_from );
}
[download]

Something I was thinking about though was the fact sometimes it may not hit right on the HH:MM:00 and not cut the files appropriately. Thanks for any suggestions!

Comment on Splitting up a file in hourly increments Download Code

Replies are listed 'Best First'.
Re: Splitting up a file in hourly increments by tuxz0r (Pilgrim) on Dec 12, 2007 at 20:20 UTC
Does this script run continuously in the background, like a daemon, or is it run via cron or is it manual? If it runs continuously, then you just need to check prior to writing the data out, comparing the current hour value with the last hour value you used during you last processing and use that to create the new data file. If it is run via cron or manually, you'll need to store the last hour value checked when you processed and if it's different, open the new file for writing using the current timestamp. It really depends on how you run the program, which affects how you keep track of the hour value used to determine creation of the new data file. --- s;;:<).>\|\;\;_>?\\^0<\|=!]=,\|{\$/.'>\|<?.\|/"&?=#!>%\$\|#/\$%{};;y;,'} -/:-@[-`{-};,'}`-{/" -;;s;;$_;see; Warning: Any code posted by tuxz0r is untested, unless otherwise stated, and is used at your own risk.	[reply]
Re^2: Splitting up a file in hourly increments by hallikpapa (Scribe) on Dec 12, 2007 at 20:34 UTC
Oh man now that I think about it, this is kind of a dumb question, lol. It was a continuous script, and I have been just thinking in seconds and milliseconds so much lately, anything bigger just doesn't compute.	[reply]
Re: Splitting up a file in hourly increments by pc88mxer (Vicar) on Dec 12, 2007 at 20:52 UTC
I'm going to take stab at what I think you are doing - correct me if I don't have it right. It seems that you are monitoring a log file, and you want to split the data into hourly log files (and perhaps perform some processing on them, too.) Assuming this is what you want to do, here's my approach: `my ($t0, $path0, $out); $t0 = time; $path0 = path_for_time($t0); open($out, ">>", $path0); # note below while (defined(my $line = readline())) { my $t = time; my $path1 = path_for_time($t); if ($path1 ne $path0) { close($out); $path0 = $path1; open($out, ">>", $path0); } print $out $line; flush($out); # note below ... perform other processing on $line ... }` [download] All you need to supply is the `path_for_time()` subroutine. The routine `readline()` can just read from a pipe from `tail` as it seems like you are doing or you can use `File::Tail` to move the functionality into the perl script itself. Also, I use open the output files in append mode and call `flush()` in case you ever make this script restartable someday. Is this what you want to do?	[reply] [d/l] [select]
Re: Splitting up a file in hourly increments by andreas1234567 (Vicar) on Dec 13, 2007 at 09:16 UTC
I use Log::Log4perl and Log::Dispatch::FileRotate extensively. They can create log files `file, file.1, file.2, ..., file.n` based on time or file size constraints. -- Andreas	[reply] [d/l]


Think about Loose Coupling
	PerlMonks