Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Slow but it works

by scottstef (Curate)
on Oct 03, 2001 at 16:01 UTC ( #116416=perlquestion: print w/ replies, xml ) Need Help??
scottstef has asked for the wisdom of the Perl Monks concerning the following question:

I had a problem with our log analyzing software. The vendor said the problem was due to the fact that logs were not sequentially logged. I bulled my way through this, and it works, but it takes forever. What would be the best way to tweak it without deviating from the standard perl distribution?
!/usr/local/bin/perl -w use strict; $| = 1; use diagnostics; ###################################################################### ## This file is designed to reorder the log files for the corporate ## web server. These log files will be sequentially ordered for fast +er ## parsing. ###################################################################### my $existingLogFile; #the current file + that we will parse my $newLogFile = "$existingLogFile"."_sorted"; #the log file tha +t will be written my $logdirs = "/data/sortlogs/"; my @unique; my $counter = 0; my @timestamps; opendir (LOGDIR, $logdirs) or die "no $logdirs: $!"; while ($_ = readdir(LOGDIR)) { if ($_ =~ /^access/) { $existingLogFile = $_; } } open (EXFILE, "$existingLogFile") or die "Could not open $existingLogF +ile $!"; while (<EXFILE>) { if (/^\d/) { chomp; my $timeString; my $begOfLog; my $endOfString; ($begOfLog, $timeString, $endOfString) + = split / [\[\]]/,$_; push (@timestamps, $timeString); } } close (EXFILE); @timestamps = sort (@timestamps); $unique[0] = $timestamps[0]; foreach my $stamp(@timestamps) { unless ($stamp eq $unique[$counter]) { push (@unique, $stamp); $counter++; } } foreach my $ts (@unique) { open (SORTEDFILE, ">>$newLogFile") or +die "Could not open $newLogFile $!\n"; open (EXFILE, "+< $existingLogFile") o +r die "Cou ld not open $existingLogFile $!"; while (<EXFILE>) { my $entryStamp; my $before; my $after; ($before, $entryStamp, + $after) = split /[\[\]]/,$_; my $strippedEntry = te +ll(EXFILE) opendir (LOGDIR, $logdirs) or die "no $logdirs: $!"; while ($_ = readdir(LOGDIR)) { if ($_ =~ /^access/) { $existingLogFile = $_; } } open (EXFILE, "$existingLogFile") or die "Could not open $existingLogF +ile $!"; $newLogFile = "$existingLogFile"."_sorted"; while (<EXFILE>) { if (/^\d/) { chomp; my $timeString; my $begOfLog; my $endOfString; ($begOfLog, $timeString, $endOfString) + = split / [\[\]]/,$_; push (@timestamps, $timeString); } } close (EXFILE); @timestamps = sort (@timestamps); $unique[0] = $timestamps[0]; foreach my $stamp(@timestamps) { unless ($stamp eq $unique[$counter]) { push (@unique, $stamp); $counter++; } } foreach my $ts (@unique) { open (SORTEDFILE, ">>$newLogFile") or +die "Could not open $newLogFile $!\n"; open (EXFILE, "+< $existingLogFile") o +r die "Could not open $existingLogFile $!"; while (<EXFILE>) { my $entryStamp; my $before; my $after; ($before, $entryStamp, + $after) = split /[\[\]]/,$_; my $strippedEntry = te +ll(EXFILE); if ($ts eq $entryStamp +) { print +SORTEDFILE "$_"; } else { open ( +TEMPFILE, ">>/data/temp") or die "Couldn't open temp fil e $!\n"; print +TEMPFILE "$_"; } } close (EXFILE); close (SORTEDFILE); close (TEMPFILE); rename ("/data/temp",$ +existingLogFile); }

"The social dynamics of the net are a direct consequence of the fact that nobody has yet developed a Remote Strangulation Protocol." -- Larry Wall

Comment on Slow but it works
Download Code
Re: Slow but it works
by davorg (Chancellor) on Oct 03, 2001 at 16:22 UTC

    It does seem very complicated :) Can't you just use the Unix sort command to sort the files?

    And I think you may have made a mistake when copying your code into this post. What you have there doesn't compile - it is missing a closing brace somewhere.

    Update: I think there was a paste error in the code. A large chunk of the code was repeated. He's what I think the code should look like (Note: I've reformatted it a bit, but haven't made any fixes yet)

    #!/usr/local/bin/perl -w use strict; $| = 1; use diagnostics; #################################################################### ## This file is designed to reorder the log files for the corporate ## web server. These log files will be sequentially ordered for ## faster parsing. #################################################################### my $existingLogFile; my $newLogFile = "$existingLogFile"."_sorted"; my $logdirs = "/data/sortlogs/"; my @unique; my $counter = 0; my @timestamps; opendir (LOGDIR, $logdirs) or die "no $logdirs: $!"; while ($_ = readdir(LOGDIR)) { if ($_ =~ /^access/) { $existingLogFile = $_; } } open (EXFILE, $existingLogFile) or die "Could not open $existingLogFile $!"; while (<EXFILE>) { if (/^\d/) { chomp; my ($begOfLog, $timeString, $endOfString) = split /[\[\]]/, $_; push (@timestamps, $timeString); } } close (EXFILE); @timestamps = sort (@timestamps); $unique[0] = $timestamps[0]; foreach my $stamp(@timestamps) { unless ($stamp eq $unique[$counter]) { push (@unique, $stamp); $counter++; } } foreach my $ts (@unique) { open (SORTEDFILE, ">>$newLogFile") or die "Could not open $newLogFile $!\n"; open (EXFILE, "+<$existingLogFile") or die "Could not open $existingLogFile $!"; while (<EXFILE>) { my ($before, $entryStamp, $after) = split /[\[\]]/, $_; my $strippedEntry = tell(EXFILE); if ($ts eq $entryStamp) { print SORTEDFILE $_; } else { open (TEMPFILE, ">>/data/temp") or die "Couldn't open temp file $!\n"; print TEMPFILE $_; } } close (EXFILE); close (SORTEDFILE); close (TEMPFILE); rename ("/data/temp",$existingLogFile); }
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you don't talk about Perl club."

Re: Slow but it works
by gbarr (Monk) on Oct 03, 2001 at 17:53 UTC
    It is slow probably because you seem to be doing an awful lot of reading and writing. An example log entry would have helped, but the following (with the addition of the appropriate checking) will create the log file on sorted order without a temp file and without readin the whole file into memory.

    my $MAXBUFLEN = 1024 * 1024; # 1M sub sort_logfile { my $file = shift; my @off = (0); my @lines; open(INFILE, "<$file"); while(<INFILE>) { push @off, tell; if (/^\d.*?\[([^\]]+)\]/) { push @lines, [$1, $.-1]; } else { # assume any non-matching line # is a continuation of the previous $off[-2] = $off[-1]; next; } } @lines = sort { $a->[0] cmp $b->[0] } @lines; open(INFILE,"<$file"); my $line = 0; my $buf; while (my $i = shift @lines) { my ($start, $end) = ($i->[1],$i->[1]+1); while( @lines and $lines[0]->[1] == $end) { shift @lines; $end++; } ($start,$end) = @off[$start,$end]; my $len = $end - $start; while ($len > 0) { seek(INFILE, $start, 0) or die "seek: $!"; read(INFILE, $buf='', $len > $MAXBUFLEN ? $MAXBUFLEN : $len) or die "read: $!"; print $buf; $len -= length $buf; } } }
Re: Slow but it works
by pixel (Scribe) on Oct 03, 2001 at 17:55 UTC

    This piece of code:

    @timestamps = sort (@timestamps); $unique[0] = $timestamps[0]; foreach my $stamp(@timestamps) { unless ($stamp eq $unique[$counter]) { push (@unique, $stamp); $counter++; } }

    looks like you're trying to copy the unique elements of @timestamps into @unique. I'd do that like this:

    my %unique; @unique{@timestamps} = (); my @unique = keys %unique;

    Blessed Be
    The Pixel

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://116416]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2014-08-27 23:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (253 votes), past polls