http://www.perlmonks.org?node_id=926029

jb60606 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, relatively new to Perl and hoping that someone might be able to point me in the right direction. I want to accept a group of 1 or more user defined numbers ("thresholds") from a user running the script.
e.g. ./script 1000 8000 5000 2000

The script will read through each line of a file, searching for lines (specifically, the 10th element of each line) containing but not exceeding or crossing the boundries of each of the other thresholds

ie: If the thresholds are 1000, 8000, 5000, 2000, then search each line fo +r a threshold that is either -Greater than 1000, but less than 2500 -Greater than 8000 -Greater than 5000, but less than 8000 -Greater than 2500, but less than 5000

FYI: Each line of the file will resemble the following. The only part of this line that I am concerned with is the "Pending=XXXXX" element

2011-09-12 10:20:54.473 hostname [13996,14019]: (WALL, MDConnection.cp +p:420) Received data on Connection[OPRA-GRP13-16]. Pending=220. Total + Messages processed by Connection [OPRA-GRP13-16]=97190000

I have a simple script that that asks the user for a single threshold, then looks at the 10th element of each line, further tallying each instance for a final "count" of how many lines exceed this threshold. But am having a lot of trouble articulating any ideas on how to go about this; first looping through the array of thresholds the user gave me, then using that to create a single line of code (if statement, for example) that meets the criteria of the user specified thresholds. Is there an easy way to do this? Thanks

I'm not sure if it will be any use, but I can provide code examples of my current script, if necessary.

Replies are listed 'Best First'.
Re: Any easy way to do this?
by graff (Chancellor) on Sep 15, 2011 at 02:26 UTC
    So, you're expecting a user to come up with a set of threshold values to put on the command line? How likely is it, really, that a user will want to try a bunch of different variations of threshold values? (In fact, how likely is it that a user already knows what ranges of values are going to be useful?)

    Any a priori assumptions that might provide sensible aid to reduce the user's "cognitive load" would be worth building into the script -- e.g. maybe threshold values should always be evenly spaced over an appropriate range, and users would just say how many thresholds (histogram bins) they want on a given run.

    Regarding the code you posted, I'd offer a few "stylistic" points:

    use Getopt::Long; # or Getopt::Std, which might be easier to grok.
    That will make it easy to offer useful default values for things like number of bins, start-time and end-time. There could even be a default value for the name of the log file to read.

    Perl gives a warning about line 59 -- it's harmless, but worth fixing.

    When there's an "if" block that always ends with "exit 1" (which should just be "die"), there's no need for an "else" block after that (you can eliminate a layer of embedding). Likewise, you don't need an "else" block that contains just a next statement, given that there's nothing after that block in the enclosing loop.

    Assuming you have an array of threshold values, you just need to make sure the array values are sorted, and loop over them to work out which bin a given value should be counted in -- here's a simple example that leaves aside all your other issues about selecting/excluding log entries:

    my @thresh = ( 1000, 4000, 7000, 10000 ); my @bins; while (<LOG>) { my $val = ( split )[10]; next unless ( $val =~ /^\d+$/ ); my $i; for $i ( 0 .. $#thresh ) { last if ( $val < $thresh[$i] ); } $bins[$i]++; }
    (UPDATED to give appropriate scope to $i -- thanks to wfsp for pointing that out.)

    Geez! As GrandFather points out below, I really didn't get that right. Even after wfsp had told me it wouldn't work, I still had it wrong. What I should have suggested was something like this (thanks, GrandFather):

    my @thresh = ( 1000, 4000, 7000, 10000 ); my @bins; while (<LOG>) { my $val = ( split )[10]; next unless ( $val =~ /^\d+$/ ); my $i = 0; while ( $i < @thresh and $val > $thresh[$i] ) { $i++; } $bins[$i]++; }

      Thanks Graf, i'll give this a try tonight or tomorrow.

      Realistically, the user will likely specify thresholds between 1000 and 10,000 spaced by about 5000. So in all likelihood 1000, 5000 and 10000 are probably the only thresholds that this script will ever see, but I wanted to leave the user's options open.

      You're probably right, and I should just hard-code a range of thresholds to be run by default and maybe provide an override to run the script using a single user-defined threshold.

        Note the correction to my code snippet -- if $i were lexically scoped in the "for" statement (as originally posted), it would be unavailable after exiting that loop.
Re: Any easy way to do this?
by toolic (Bishop) on Sep 15, 2011 at 00:10 UTC
    but I can provide code examples of my current script
    Please do. Make it so that we can run it too. Provide a few lines of your input file, and provide the output you expect to eliminate any ambiguity. See also: histogram

    Did you really mean 2000, not 2500?

      #!/usr/bin/perl # Description: Read SRLabs feed handler log file, search for and print + "pending queues" exceeding a given threshold within a given time-fra +me. # Usage: Run script without command line arguments for usage details use strict; my $hostname = `hostname -s`; $hostname =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/; my $total; # variable to be used for queue total for the given timefra +me my $count = 0; # Check usage. Exit if incorrect, then display usage details my $numArgs = $#ARGV + 1; if (($numArgs <= 3) || ($numArgs > 5)) { print "\nusage:\n"; print usage(); exit 1; } else { # Get command line arguments, convert time to seconds my $logFile = $ARGV[0]; my $sTime= $ARGV[1]; my @sTime=split(/:/,$sTime); # split start time my $sSecs=$sTime[0] * 3600 + $sTime[1] * 60 + $sTime[2]; # con +vert start-time to seconds my $eTime = $ARGV[2]; my @eTime=split(/:/,$eTime); # split end time my $eSecs=$eTime[0] * 3600 + $eTime[1] * 60 + $eTime[2]; # con +vert stop-time to seconds my $tHold = $ARGV[3]; my $date = $ARGV[4]; # Get today's date which will be used as the default. Will add opt +ion to enter date, manually, soon my($day, $month, $year) = (localtime)[3,4,5]; $month = sprintf '%02d', $month+1; $day = sprintf '%02d', $day; $year = $year+1900; #my $ymd = "$year-$month-$day"; my $ymd = "2011-09-12"; open LOGFILE, "<", "$logFile" or die $!; while (<LOGFILE>){ my $line=$_; chomp; my @data=split(/ /,$line); # split the line up unless (($data[10] =~ m/Pending=/)) { next; } # skip elements +we don't want my $lineInfo = $data[9]; $lineInfo =~ s/[Connection\[\].]//g; $data[10] =~ s/[A-Za-z=.]//g; # delete "Pending", "=" and "." $data[1] =~ s/\..*//g; # delete the millisecond element of the + (current)cTime var my @cTime=split(/:/,$data[1]); # split the current time my $curSecs=$cTime[0] * 3600 + $cTime[1] * 60 + $cTime[2]; # c +onvert current-time to seconds if (($data[0] eq $ymd) && ($data[10] >= $tHold) && ($curSecs > += $sSecs) && ($curSecs <= $eSecs)) { $count+1; $total += $data[10]; $count++; } else { next; } } print "$hostname,$ymd,$sTime-$eTime,$tHold,$count,$total\n"; print "\n$hostname ($ymd)\n"; print "$count pending queues meet or exceed $tHold\n"; print "Aggregate ($sTime to $eTime): $total\n\n"; # print the Queu +e for the timeframe } close LOGFILE; sub usage { print "\n\./readLog.pl [log-file] [start-time] [end-time] [threshold]\ +n\n"; print " log-file Log file name\n"; print " start-time Start time as HH:MM [e.g. \"06:00\"]\n"; print " end-time End time as HH:MM [e.g. \"13:15\"]\n"; print " max-pending Pending queue threshold [e.g. \"1000\"]\ +n\n"; }

      -------------------------------------------------------------------------------------------------------------------

      Example log file output

      2011-09-12 10:32:16.285 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP41-44. Pending=0. Total Messages processed by ConnectionOPRA-GRP41-44=10570000 0 2011-09-12 10:32:16.499 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP17-20. Pending=0. Total Messages processed by ConnectionOPRA-GRP17-20=12193000 0 2011-09-12 10:32:16.876 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP33-36. Pending=0. Total Messages processed by ConnectionOPRA-GRP33-36=10667000 0 2011-09-12 10:32:16.935 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP45-48. Pending=0. Total Messages processed by ConnectionOPRA-GRP45-48=98140000 2011-09-12 10:32:16.966 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP5. Pending=0. Total Messages processed by ConnectionOPRA-GRP5=31930000 2011-09-12 10:32:17.073 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP17-20. Pending=0. Total Messages processed by ConnectionOPRA-GRP17-20=12194000 0 2011-09-12 10:32:17.123 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP29-32. Pending=0. Total Messages processed by ConnectionOPRA-GRP29-32=10861000 0 2011-09-12 10:32:17.172 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP37-38. Pending=0. Total Messages processed by ConnectionOPRA-GRP37-38=63700000 2011-09-12 10:32:17.196 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP9-12. Pending=0. Total Messages processed by ConnectionOPRA-GRP9-12=119390000 2011-09-12 10:32:17.236 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP13-16. Pending=87. Total Messages processed by ConnectionOPRA-GRP13-16=1041100 00 2011-09-12 10:32:17.248 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP39-40. Pending=6. Total Messages processed by ConnectionOPRA-GRP39-40=51620000 2011-09-12 10:32:17.301 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP41-44. Pending=341. Total Messages processed by ConnectionOPRA-GRP41-44=105710 000 2011-09-12 10:32:17.330 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP1-4. Pending=2. Total Messages processed by ConnectionOPRA-GRP1-4=93230000 2011-09-12 10:32:17.374 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP7-8. Pending=0. Total Messages processed by ConnectionOPRA-GRP7-8=63680000 2011-09-12 10:32:17.390 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP25-28. Pending=0. Total Messages processed by ConnectionOPRA-GRP25-28=10501000 0 2011-09-12 10:32:17.392 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP21-24. Pending=2. Total Messages processed by ConnectionOPRA-GRP21-24=89610000 2011-09-12 10:32:17.418 pmmd-ltc-fsrlabs02 13996,14019: (WALL, MDConnection.cpp:420) Received data on ConnectionOPRA-GRP6. Pending=0. Total Messages processed by ConnectionOPRA-GRP6=26110000

      -------------------------------------------------------------------------------------------------------------------

      hostname,2011-09-14,8:30-15:00,5000,492,3704405 hostname (2011-09-14) 492 pending queues meet or exceed 5000 Aggregate (8:30 to 15:00): 3704405

      I did mean 2500, but forgot to change the earlier thresholds. I only put 2500 in, to make it known that the thresholds won't be static and are completely decided by the user

        Oops... that logfile excerpt needs to be in a code-tag, too. Can you please edit the post? We only need to see one or two of the lines.
        pardon the sloppy code, and i'm sure that there are much easier ways to accomplish that script. Still learning and just needed to whip something up quickly.
        Hey jb, I'm new to perl and wanted to know if you can explain the variables for your script. Thanks AJ