Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Monk Specs.?

by draper7 (Scribe)
on Jan 16, 2002 at 19:04 UTC ( #139214=perlquestion: print w/replies, xml ) Need Help??

draper7 has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, I come in seek of knowledge..
The following is a snippet of code from a script I wrote a while back that grabs info from Arcserve summary logs. I'm very self consious about the code I write and was wondering if this is morally ok? This part just gathers all info between the first occurance of the date and the last line of the file.
Does this meet Monk specification?
my @input = <LOG>; my $counter = 0; my $start_time; # this finds the date we are looking for # and stores its array location # sample of the lines we are looking at -> # 20020111 164741 892 26,212 Directories 52,846 Files (805.49 MB) +Backed Up to Media. foreach $_ (@input) { $counter++; if (/$date/) { $start_time = $counter; last; } } # last line my $end = $#input; # if log file is blank if ($start_time eq 0) { $start_time = $end; } # print summary to another log foreach $_ (@input[$start_time..$end]) { print LOGFILE "$_"; }

Replies are listed 'Best First'.
Re: Monk Specs.?
by talexb (Chancellor) on Jan 16, 2002 at 19:35 UTC
    In terms of a code review, I can give you the following suggestions and comments.
    • You're reading in the entire log file into an array with the line my @input = <LOG>; I don't know about you, but the log files I come across vary between large and immense. There's no reason to suck the entire file into memory. Better just to read a line at a time.
    • You are apprently interested in logging where in the file a particular date starts. If you'd used while(<LOG>) you could have just used the builtin variable $.. In anycase, why not use something meaningful like $LineNumber rather than the anonymous $counter.
    • Your declaration of $_ is redundant. Perl assumes $_ if you don't declare a loop variables. But that line disappears if you use the while loop mentioned above.
    • At the end you test if the log file is 'blank'; wouldn't it be better to explain that you're testing to see if the date you were looking for was found? In any case, you've used eq which is a string compare. To see if a value is the same as zero, use ==
    So if I were rewriting the code, I'd say
    while ( <LOG> ) { if ( /$date/ ) { print LOG_FILE <LOG>; } # UPDATED }
    What does this code do? If the target date is found while looking through the log file, print the remainder of the log file. The print will continue until the file runs out. At that point the loop will continue, but the log file has run out, and the loop will exit. (This assumes that the data is sorted!)

    Study hard. :)

    --t. alex

    "Of course, you realize that this means war." -- Bugs Bunny.

    Update: Added LOG_FILE file handle that was missing from the original post, as indicated by replies after this note.

      What does this code do? If the target date is found while looking through the log file, print the remainder of the log file.

      Note quite. As soon as the date changes, your script as it stands will stop printing additional lines to the other logfile (which, according to the initial script, is addressed by the LOGFILE file handle). You are printing the records back into the original logfile!

      I ++ you anyway, because apart from that your advice is faultless.

      g r i n d e r
      print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';
        Eeyow. I hate when that happens. OK, I change that line to print LOG_FILE <LOG>; My heart was in the right place. :)

        --t. alex

        "Of course, you realize that this means war." -- Bugs Bunny.

      um... your points are valid, but this code is incorrect. *ducks*

      while ( <LOG> ) { if ( /$date/ ) { print <LOG>; } }
      firstly, you're reading from and printing to the same filehandle, probably a typo. it should print LOGFILE;. secondly, it's only printing lines that match the date, instead of printing all lines from the matched date until the end of the file.


        Hmm .. don't think my code is wrong. The if statement watches for the correct date, and when that date's found, the entire rest of the file is output. My original reply had output going to STDOUT, but I later fixed that so it went to LOG_FILE, as in the original post. The input file handle is LOG.

        --t. alex

        "Of course, you realize that this means war." -- Bugs Bunny.

      What was that admonition about sucking immense log files into memory?

      And what does this do?

      print LOG_FILE <LOG>;

      How about:
      print LOG_FILE while <LOG>;
Re: Monk Specs.?
by grinder (Bishop) on Jan 16, 2002 at 19:33 UTC
    Um, it is morally wrong to slurp log files into perl arrays :)

    Just read the records one at a time, and when you encounter the date you are looking for, start writing to the other log file

    my $found_date = 0; while( <LOG> ) { $found_date = 1 if /$date/o; print LOGFILE if $found_date; }

    I think that even if you don't find the date, your script as it stands will still write the last line to second file. I can't decide whether that is a bug or a feature... if the latter, then you should print LOGFILE unless $found_date after the end of the while loop.

    Note that saying print LOGFILE is the same as saying print LOGFILE $_ (the $_ is implied). In any event, you don't need to quote $_.

    g r i n d e r
    print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';
Re: Monk Specs.?
by Hofmator (Curate) on Jan 16, 2002 at 19:42 UTC

    I'm not sure what you mean by 'morally OK' and 'Monk specifiation' but you could write your code shorter as

    while (<LOG>) { print LOGFILE $_ if /$date/..0; }

    -- Hofmator

      Nice. Its even shorter as:
      /$date/..0 and print LOGFILE while <LOG>;


      wow. super cool one liner, Hofmator!

      can you explain to me the construct (specifically if /$date/..0;)? if i can understand it, i can use it :)


        Read up on the bi-stable .. operator in perlop under 'Range Operators'.

        At first it evaluates to false. It changes its state to true once the condition on the left hand side (the regex matching $date) is true. It changes its state back to false when the right hand side is true. Here this is never the case (0) so the file is read until the end.

        -- Hofmator

Re: Monk Specs.?
by danger (Priest) on Jan 16, 2002 at 21:14 UTC

    Well, we'll assume you've got strict and warnings turned on, and that you've checked the return values when you opened your LOG and LOGFILE filehandles :-)

    It looks pretty good stylistically speaking. A couple of points though: Log files can be large --- is there any other reason you need to read it into an array? Also, if your log file is empty then $start_time will be undef rather than 0 --- in fact, in your algorithm, $start_time can never be 0 (it will be undef if the file was empty or the $date was never found, and it will be greater than 0 if the $date is found ... even if is found in the first line of the array (position 0) ... oops).

    If you do not need @input except for in this snippet, then you do not really need this array at all, and you won't need the $counter, $start_time, or $end variables either. You can simply read through the file and print everything between the first match of $date and the end of file (eof) --- your entire snippet may be reduced to:

    while(<LOG>){ next unless /$date/ .. eof(LOG); print LOGFILE; }

    It isn't only shorter --- it helps avoid logic mistakes like for instance: Consider what happens in your code if the $date is found in the first line of the file (element 0 in the array). Your code has already incremented the counter, so your output would skip the first line of the file (the 0'th element of the array that actually contains the search term). Trying to fix that by incrementing the counter at the bottom of the loop instead would set your start_time properly to 0, but then your "empty file" test would come into play (start_time is 0, but the file wasn't empty). Lastly, if the date was never found, what is going to happen in your code (with or without changing the location of the $counter increment)?

    When you find yourself creating additional variables that have little or nothing to do with problem at hand, and more to do with the details of your solution (book-keeping variables most commonly), you should ask yourself if this is best way to express your solution (such variables are often referred to as 'synthetic variables', see references at end of this post). In the reduced snippet, all of the identifiers relate to the problem space: $date is the search term, LOG is the input file, eof(LOG) refers to the end of the input file, and LOGFILE is the output file. The entire logic of the snippet is captured in the range operator (see perlop).

    If you did require the @input array elsewhere in your code, you can still reduce your logic in a similar manner:

    my @input = <LOG>; foreach (@input) { next unless /$date/ .. undef; print LOGFILE; }

    Using the range operator like this may seem like a bit of a trick (or at least esoteric enough that not everyone would think of it) --- so let's consider an algorithm that doesn't use it at all but you still want the array. We could begin as you did, trying to find the first valid location in the array (but we have the problem that the first valid location might be 0, and how will we tell if the location is 0 because that's where we found the search term, or its 0 because we never found the search term --- we could use defined, but now we not only have a synthetic variable, we have additional synthetic code to determine what some of its values mean). What we want in this case is a simple flag to tell us if we've found the search term or not --- yes it is a synthetic variable, but it is the minimum required if we aren't using the range op, it only means one thing (did we find it, true or false), and we can restrict its scope:

    my @input = <LOG>; { my $found = 0; foreach (@input) { $found = 1 if /foo/; print LOGFILE if $found; } }

    Dominus has a couple of articles the mention synthetic variables at that are well worth reading:

Re: Monk Specs.?
by particle (Vicar) on Jan 16, 2002 at 19:37 UTC
    well, i took a stab at it, but i didn't have much data to test against, so i made some up. this processes the log file line by line, so it doesn't use as much memory, and only passes through the file once, reading and processing in the same step. it tests the first eight characters of each line (the date,) against a predefined value, $date. if this condition holds, the line is printed, in my example, to the screen.

    in order to use this, you'll have to change the DATA handle to LOG, and change print ; to print LOGFILE;

    #!/usr/local/bin/perl -w use strict; my $date = '20020111'; while( <DATA> ) { # ignore lines until the first eight characters of the line are # greater than or equal to $date next unless( substr($_, 0, 8) >= $date ); print ; } __DATA__ 20020109 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba +cked Up a 20020110 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba +cked Up b 20020111 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba +cked Up c 20020111 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba +cked Up x 20020112 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba +cked Up y 20020112 164742 892 26,212 Directories 52,846 Files (805.49 MB) Ba +cked Up z 20020113 164741 892 26,212 Directories 52,846 Files (805.49 MB) Ba +cked Up 1


Re: Monk Specs.?
by Anonymous Monk on Jan 16, 2002 at 19:32 UTC
    my $counter = 0; my $dateRX = qr/^$date/; # assuming $date is defined elsewhere while (<LOG>){ last if /$dateRX/; $counter++; } print LOGFILE $_ while (<LOG>); #assuming LOGFILE was opened earlier i +n your code

      Only problem is that you are no longer incrementing counter after you encounter the date.

      my $counter = 0; my $dateRX = qr/^$date/; # assuming $date is defined elsewhere while (<LOG>){ last if /$dateRX/; $counter++; } while ( <LOG> ) { print LOGFILE; $counter++; }

      UPDATE: I assumed that $counter was actually doing something outside of the context of this code, maybe I should not have assumed that. If the only thing that counter is being used for is to assign the start date variable, then my revision above is un-neccessary. It should also be noted that in the original code, since counter is incremented before the regexp is checked, then the start date variable will always point to the first line AFTER the one that matched.

Re: Monk Specs.?
by Asim (Hermit) on Jan 16, 2002 at 22:49 UTC

    I'm going to post a snip of the main loop for my Arcserve parser, mostly because I catch some of the weird things that pop up in our logs. Errors that start on a seperate line is one that tripped me up bad. These might help you if you run into them in your logs:

    #%jobentry is defined outside of the while loop while (<LOG>) { next if /^\n/; my %line; my ($date, $time, $jobinfo, $jobtext); chomp; if (/^(E\d{4})/){ $jobtext .= "[$1] "; $_ = substr $_, 6; }elsif (!/^\d{8}/){ my $tmp_count = $count - 1; $job_entry[$tmp_count]{'text'} .= $_; next; } $date = substr $_, 0, 8; $time = substr $_, 9, 6; $jobinfo = substr $_, 16, 6; $jobtext .= substr $_, 23; #o_trim is a little sub for taking out any straggling whitespace in th +e job description $jobinfo = o_trim($jobinfo); #Go on with rest of loop. I push the data into a hash, and then push t +he hash into a database structure after I gather an entire job's wort +h of entries. }

    The entire code is tested and works, if quite quirky, as I was playing with some ideas about data structures when I did it. I'll post it if there is any interest.

    ----Asim, known to some as Woodrow.

Re: Monk Specs.?
by claree0 (Hermit) on Jan 16, 2002 at 19:48 UTC
    I did something similar in this node and discovered just how much faster you can do it thanks to some very helpful replies.
Re: Monk Specs.?
by draper7 (Scribe) on Jan 16, 2002 at 21:43 UTC
      Thanks everyone for the tons of info you've given me! Its amazing how many ways there are to do things in Perl, I guess I just like to go the long & slow way :)
      I think the reason I used an array in the first place is because I'm pulling this information off several different server's, but that really doesn't give it much justice. Anyway I'm in the process of checking out all of the links suggested.
      Thanks Again,


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://139214]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2022-11-30 21:29 GMT
Find Nodes?
    Voting Booth?