Re: Monk Specs.?
by talexb (Chancellor) on Jan 16, 2002 at 19:35 UTC
|
In terms of a code review, I can give you the following suggestions and comments.
- You're reading in the entire log file into an array with the line my @input = <LOG>; I don't know about you, but the log files I come across vary between large and immense. There's no reason to suck the entire file into memory. Better just to read a line at a time.
- You are apprently interested in logging where in the file a particular date starts. If you'd used while(<LOG>) you could have just used the builtin variable $.. In anycase, why not use something meaningful like $LineNumber rather than the anonymous $counter.
- Your declaration of $_ is redundant. Perl assumes $_ if you don't declare a loop variables. But that line disappears if you use the while loop mentioned above.
- At the end you test if the log file is 'blank'; wouldn't it be better to explain that you're testing to see if the date you were looking for was found? In any case, you've used eq which is a string compare. To see if a value is the same as zero, use ==
So if I were rewriting the code, I'd say
while ( <LOG> )
{
if ( /$date/ ) { print LOG_FILE <LOG>; } # UPDATED
}
What does this code do? If the target date is found while looking through the log file, print the remainder of the log file. The print will continue until the file runs out. At that point the loop will continue, but the log file has run out, and the loop will exit. (This assumes that the data is sorted!)
Study hard. :)
--t. alex
"Of course, you realize that this means war." -- Bugs Bunny.
Update: Added LOG_FILE file handle that was missing from the original post, as indicated by replies after this note.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
What does this code do? If the target date is found while looking through the log file, print the remainder of the log file.
Note quite. As soon as the date changes, your script as it stands will stop printing additional lines to the other logfile (which, according to the initial script, is addressed by the LOGFILE file handle). You are printing the records back into the original logfile!
I ++ you anyway, because apart from that your advice is faultless.
--g r i n d e r
print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] [d/l] |
|
um... your points are valid, but this code is incorrect. *ducks*
while ( <LOG> )
{
if ( /$date/ ) { print <LOG>; }
}
firstly, you're reading from and printing to the same filehandle, probably a typo. it should print LOGFILE;. secondly, it's only printing lines that match the date, instead of printing all lines from the matched date until the end of the file.
~Particle | [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
What was that admonition about sucking immense log files into memory?
And what does this do?
print LOG_FILE <LOG>;
How about: print LOG_FILE while <LOG>;
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Monk Specs.?
by grinder (Bishop) on Jan 16, 2002 at 19:33 UTC
|
Um, it is morally wrong to slurp log files into perl arrays :)
Just read the records one at a time, and when you encounter the date you are looking for, start writing to the other log file
my $found_date = 0;
while( <LOG> ) {
$found_date = 1 if /$date/o;
print LOGFILE if $found_date;
}
I think that even if you don't find the date, your script as it stands will still write the last line to second file. I can't decide whether that is a bug or a feature... if the latter, then you should print LOGFILE unless $found_date after the end of the while loop.
Note that saying print LOGFILE is the same as saying print LOGFILE $_ (the $_ is implied). In any event, you don't need to quote $_.
--g r i n d e r
print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Monk Specs.?
by Hofmator (Curate) on Jan 16, 2002 at 19:42 UTC
|
I'm not sure what you mean by 'morally OK' and 'Monk specifiation' but you could write your code shorter as
while (<LOG>) {
print LOGFILE $_ if /$date/..0;
}
-- Hofmator | [reply] [Watch: Dir/Any] [d/l] |
|
Nice. Its even shorter as:
/$date/..0 and print LOGFILE while <LOG>;
-Blake
| [reply] [Watch: Dir/Any] [d/l] |
|
| [reply] [Watch: Dir/Any] [d/l] |
|
Read up on the bi-stable .. operator in perlop under 'Range Operators'.
At first it evaluates to false. It changes its state to true once the condition on the left hand side (the regex matching $date) is true. It changes its state back to false when the right hand side is true. Here this is never the case (0) so the file is read until the end.
-- Hofmator
| [reply] [Watch: Dir/Any] |
Re: Monk Specs.?
by danger (Priest) on Jan 16, 2002 at 21:14 UTC
|
Well, we'll assume you've got strict and warnings turned on, and
that you've checked the return values when you opened your LOG and
LOGFILE filehandles :-)
It looks pretty good stylistically speaking. A couple of points
though: Log files can be large --- is there any other reason you need
to read it into an array? Also, if your log file is empty then
$start_time will be undef rather than 0 --- in fact, in your
algorithm, $start_time can never be 0 (it will be undef if the file
was empty or the $date was never found, and it will be greater than 0
if the $date is found ... even if is found in the first line of the
array (position 0) ... oops).
If you do not need @input except for in this snippet, then you do not
really need this array at all, and you won't need the $counter,
$start_time, or $end variables either. You can simply read through
the file and print everything between the first match of $date and
the end of file (eof) --- your entire snippet may be reduced to:
while(<LOG>){
next unless /$date/ .. eof(LOG);
print LOGFILE;
}
It isn't only shorter --- it helps avoid logic mistakes like for
instance: Consider what happens in your code if the $date is found in
the first line of the file (element 0 in the array). Your code has
already incremented the counter, so your output would skip the first
line of the file (the 0'th element of the array that actually
contains the search term). Trying to fix that by incrementing the
counter at the bottom of the loop instead would set your start_time
properly to 0, but then your "empty file" test would come into play
(start_time is 0, but the file wasn't empty). Lastly, if the date was
never found, what is going to happen in your code (with or without
changing the location of the $counter increment)?
When you find yourself creating additional variables that have little
or nothing to do with problem at hand, and more to do with the
details of your solution (book-keeping variables most commonly), you
should ask yourself if this is best way to express your solution
(such variables are often referred to as 'synthetic variables', see
references at end of this post). In the reduced snippet, all of the
identifiers relate to the problem space: $date is the search term,
LOG is the input file, eof(LOG) refers to the end of the input file,
and LOGFILE is the output file. The entire logic of the snippet is
captured in the range operator (see perlop).
If you did require the @input array elsewhere in your code, you can
still reduce your logic in a similar manner:
my @input = <LOG>;
foreach (@input) {
next unless /$date/ .. undef;
print LOGFILE;
}
Using the range operator like this may seem like a bit of a trick (or
at least esoteric enough that not everyone would think of it) --- so
let's consider an algorithm that doesn't use it at all but you still
want the array. We could begin as you did, trying to find the first
valid location in the array (but we have the problem that the first
valid location might be 0, and how will we tell if the location is 0
because that's where we found the search term, or its 0 because we
never found the search term --- we could use defined, but now we
not only have a synthetic variable, we have additional synthetic code
to determine what some of its values mean). What we want in this case
is a simple flag to tell us if we've found the search term or not ---
yes it is a synthetic variable, but it is the minimum required if we
aren't using the range op, it only means one thing (did we find it,
true or false), and we can restrict its scope:
my @input = <LOG>;
{
my $found = 0;
foreach (@input) {
$found = 1 if /foo/;
print LOGFILE if $found;
}
}
Dominus has a couple of articles the mention synthetic variables
at perl.com that are well worth reading:
http://www.perl.com/pub/a/2000/11/repair3x.html
http://www.perl.com/pub/a/2000/06/commify.html
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Monk Specs.?
by particle (Vicar) on Jan 16, 2002 at 19:37 UTC
|
well, i took a stab at it, but i didn't have much data to test against, so i made some up. this processes the log file line by line, so it doesn't use as much memory, and only passes through the file once, reading and processing in the same step. it tests the first eight characters of each line (the date,) against a predefined value, $date. if this condition holds, the line is printed, in my example, to the screen.
in order to use this, you'll have to change the DATA handle to LOG, and change print ; to print LOGFILE;
#!/usr/local/bin/perl -w
use strict;
my $date = '20020111';
while( <DATA> ) {
# ignore lines until the first eight characters of the line are
# greater than or equal to $date
next unless( substr($_, 0, 8) >= $date );
print ;
}
__DATA__
20020109 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba
+cked Up a
20020110 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba
+cked Up b
20020111 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba
+cked Up c
20020111 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba
+cked Up x
20020112 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba
+cked Up y
20020112 164742 892 26,212 Directories 52,846 Files (805.49 MB) Ba
+cked Up z
20020113 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba
+cked Up 1
~Particle | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Monk Specs.?
by Anonymous Monk on Jan 16, 2002 at 19:32 UTC
|
my $counter = 0;
my $dateRX = qr/^$date/; # assuming $date is defined elsewhere
while (<LOG>){
last if /$dateRX/;
$counter++;
}
print LOGFILE $_ while (<LOG>); #assuming LOGFILE was opened earlier i
+n your code
Regards. | [reply] [Watch: Dir/Any] [d/l] |
|
my $counter = 0;
my $dateRX = qr/^$date/; # assuming $date is defined elsewhere
while (<LOG>){
last if /$dateRX/;
$counter++;
}
while ( <LOG> ) {
print LOGFILE;
$counter++;
}
UPDATE: I assumed that $counter was actually
doing something outside of the context of this code, maybe
I should not have assumed that. If the only thing that
counter is being used for is to assign the start date
variable, then my revision above is un-neccessary. It
should also be noted that in the original code, since
counter is incremented before the regexp is checked, then
the start date variable will always point to the first line
AFTER the one that matched. | [reply] [Watch: Dir/Any] [d/l] |
Re: Monk Specs.?
by Asim (Hermit) on Jan 16, 2002 at 22:49 UTC
|
I'm going to post a snip of the main loop for my Arcserve parser, mostly because I catch some of the weird things that pop up in our logs. Errors that start on a seperate line is one that tripped me up bad. These might help you if you run into them in your logs:
#%jobentry is defined outside of the while loop
while (<LOG>) {
next if /^\n/;
my %line;
my ($date, $time, $jobinfo, $jobtext);
chomp;
if (/^(E\d{4})/){
$jobtext .= "[$1] ";
$_ = substr $_, 6;
}elsif (!/^\d{8}/){
my $tmp_count = $count - 1;
$job_entry[$tmp_count]{'text'} .= $_;
next;
}
$date = substr $_, 0, 8;
$time = substr $_, 9, 6;
$jobinfo = substr $_, 16, 6;
$jobtext .= substr $_, 23;
#o_trim is a little sub for taking out any straggling whitespace in th
+e job description
$jobinfo = o_trim($jobinfo);
#Go on with rest of loop. I push the data into a hash, and then push t
+he hash into a database structure after I gather an entire job's wort
+h of entries.
}
The entire code is tested and works, if quite quirky, as I was playing with some ideas about data structures when I did it. I'll post it if there is any interest.
----Asim, known to some as Woodrow. | [reply] [Watch: Dir/Any] [d/l] |
Re: Monk Specs.?
by claree0 (Hermit) on Jan 16, 2002 at 19:48 UTC
|
I did something similar in this node and discovered just how much faster you can do it thanks to some very helpful replies. | [reply] [Watch: Dir/Any] |
Re: Monk Specs.?
by draper7 (Scribe) on Jan 16, 2002 at 21:43 UTC
|
Thanks everyone for the tons of info you've given me! Its amazing how many ways there are to do things in Perl, I guess I just like to go the long & slow way :)
I think the reason I used an array in the first place is because I'm pulling this information off several different server's, but that really doesn't give it much justice. Anyway I'm in the process of checking out all of the links suggested.
Thanks Again,
--JD | [reply] [Watch: Dir/Any] |