http://www.perlmonks.org?node_id=230355

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All:
I have spent the last 3 weeks converting a suite of shells scripts to Perl. The purpose of this can be found here, although two of my initial requirements changed after a very long hard look at the transient files I was checking against.

  • Only the first 64K of the file needs to be read - if the string I am looking for is not there, it doesn't matter if it is elswhere in the file.
  • Removing imbedded newlines wasn't a real requirement since a match in the first 64K will never have the newline problem.

    The following is the final code

    #!/usr/bin/perl -w use strict; use Time::Local; chdir "/var/spool/wt400/gateways/" . $ARGV[0]; mkdir "capture", 0755 unless (-d "capture"); my $ListTime = 0; my %Traps; my @Files; my $Counter = 1; my $Size; my $Now; my $NF; my $Matcher; my $Match_code; open (LOG, ">>/disk4/Logs/traps/" . $ARGV[0] . "_" . $ARGV[1] . ".log" +); flock(LOG,(2|4)) or exit; select LOG; while (1) { if ($Counter > 20 || ! %Traps) { if ( (stat("traplist." . $ARGV[1]))[9] gt $ListTime ) { $ListTime = (stat(_))[9]; %Traps = (); open (LIST,"traplist." . $ARGV[1]); while (<LIST>) { next if (/^#/ || /^Created\t\tExpires/ || /^\s*$/); my @Fields = split "\t" , $_; next unless (@Fields == 8); chomp $Fields[7]; my($mon, $day, $year, $hour, $min) = split ?[-/:]? , $Fields[1 +]; my $Expiration = timelocal(0, $min, $hour, $day, $mon - 1, $y +ear + 100); $Traps{$Fields[7]} = [ $Expiration,@Fields[2,5,6] ]; } close (LIST); } $Counter = 1; } $Now = time; $Match_code = ""; $Size = 0; foreach my $Trap (keys %Traps) { unless ($Traps{$Trap}[0] < $Now && $Traps{$Trap}[1]) { if ($Traps{$Trap}[3] eq "SIZE") { $Size = $Traps{$Trap}[2] if ($Traps{$Trap}[2] > 0); } else { $Trap =~ s/(\W)/\\$1/g; $Trap = "(?i-xsm)" . $Trap; $Match_code .= "return \"$Trap\" if \$_[0] =~ /$Trap/;"; } } } exit unless ($Match_code || $Size); $Matcher = eval "sub {" . $Match_code . "}"; if ($ARGV[1] eq "out") { @Files = <out/do*>; } elsif ($ARGV[1] eq "in") { @Files = <in/di*>; } else { @Files = <out/do* in/di*> } matchfile(\@Files); $Counter++; sleep 3; } sub matchfile { local($/) = \65536; FILE: while (my $File = shift @{$_[0]}) { if ($Size && -s $File >= $Size) { ($NF = $File) =~ s/^.*\///; rename $File , "capture/" . $NF . "-SIZE"; print time . " " . $NF . " " . (stat(_))[7] . " SIZE\n"; next FILE; } unless (open(FILE, $File)) { next FILE; } while (<FILE>) { my $Match = $Matcher->($_); if ($Match) { $Match =~ s/\(\?i-xsm\)//; ($NF = $File) =~ s/^.*\///; rename $File , "capture/" . $NF . "-" . $Traps{$Match}[3]; print time . " " . $NF . " " . (stat(_))[7] . " " . $Traps{$Ma +tch}[3] . "\n"; } next FILE; } } }

    The traplist file that the data is read from looks like:

    Created         Expires         Use     Type    Author  Size    Name    Trap
    07:36:56-07:36  07:36:56-07:36  1       0       XYZ     98765   SIZE    N/A
    07:36:56-07:36  07:36:56-07:36  1       0       XYZ     N/A     TRAP1   cool things to look for
    

  • The first arg is the name of the directory to look for the traplist file in as well as the base directory to work from based on arg 2.
  • The second arg gives the second piece of information to find the traplist file as well as the directory to work in

    If arg1 = blah, you would look for the traplist file in /var/spool/wt400/gateways/blah
    If arg2 = out, you would open /var/spool/wt400/gateways/blah/traplist.out and you would do your work in /var/spool/wt400/gateways/blah/out

  • Ok, so without further ado - here is my problem:

    I need to have about 20 copies of the exact same script running where the only difference is the two arguements past to it because there is a race condition beyond my control and now I am using way more memory than the shell scripts ever were. I compared:

  • ps -el | grep <shell> - sz = 50
  • ps -el | grep <perl> - sz > 300

    I know where the gap is coming from and I could handle the difference for everything else I gained if it were only one copy, but that difference gets multiplied by every copy running (about 20).

    The only thing that comes to mind is Threads, but I have heard such conflicting information I didn't even consider it when I started the port.

    Do I have to abandon my code or is there a way to take advantage of my multi-proc high end server to have one or maybe two or three handle all the directories???

    Thanks in advance - L~R