http://www.perlmonks.org?node_id=1001558

cunningrat has asked for the wisdom of the Perl Monks concerning the following question:

Okay, monks.

I've got a batch of code that *works*, that I'm trying to optimize. It reads data from a CSV file, extracts the bits that I'm interested in, and stuffs those bits in a comma-separated format into a hash of arrays.

Code snippet is included below. Fair warning, it's likely to make a Perl guru's ears bleed. I'm an intermediate Perl user, I only pretend to be a guru at work. :p

my @diskline=`grep DISKBUSY $nmondir/$files_by_date{$date} | grep -v B +usy`; foreach my $line (@diskline) { chomp ($line); my @diskarray=split /,/, $line; shift @diskarray; shift @diskarray; for my $i (0..$#diskarray) { $disks_by_date{$date}[$i]=join ',', $disks_by_date{$date}[$ +i],$diskarray[$i] } } my @memline=`grep ^MEM, $nmondir/$files_by_date{$date} | grep -v Memo +ry`; foreach my $line (@memline) { chomp ($line); my @memarray=split /,/, $line; my $rused=100-$memarray[2]; my $mused=100-$memarray[3]; $mem_by_date{$date}[0]=join ',', $mem_by_date{$date}[0],$rused; $mem_by_date{$date}[1]=join ',', $mem_by_date{$date}[1],$mused; } my @pageline=`grep ^PAGE, $nmondir/$files_by_date{$date} | grep -v Pa +ging`; foreach my $line (@pageline) { chomp ($line); my @memarray=split /,/, $line; my $fsin=$memarray[3]-$memarray[5]; my $fsout=$memarray[4]-$memarray[6]; my $recrate= ($memarray[7] == 0) ? 0 : $memarray[8]/$memarray[7] +; $mem_by_date{$date}[2]=join ',', $mem_by_date{$date}[2],$fsin; $mem_by_date{$date}[3]=join ',', $mem_by_date{$date}[3],$fsout; $mem_by_date{$date}[4]=join ',', $mem_by_date{$date}[4],$recrate +; } my $numinterfaces=int(($#{$net_by_date{$date}}+1)/7); my $numl1=$numinterfaces*2; my $numl2=$numinterfaces*4; my @dataline1=`grep ^NET, $nmondir/$files_by_date{$date} | grep -v + Network`; foreach my $line (@dataline1) { chomp ($line); my @netarray=split /,/, $line; shift @netarray; shift @netarray; for my $i (0..$#netarray) { $net_by_date{$date}[$i]=join ',',$net_by_date{$date[$i],$netarra +y[$i]; } } my @dataline2=`grep NETPACKET $nmondir/$files_by_date{$date} | grep + -v Network`; foreach my $line (@dataline2) { chomp ($line); my @netarray=split /,/, $line; shift @netarray; shift @netarray; for my $i (0..$#netarray) { $net_by_date{$date}[$i+$numl1]=join ',',$net_by_date{$date}[$i+$ +numl1],$netarray[$i]; } } my @dataline3=`grep NETERROR $nmondir/$files_by_date{$date} | grep + -v Network`; foreach my $line (@dataline3) { chomp ($line); my @netarray=split /,/, $line; shift @netarray; shift @netarray; for my $i (0..$#netarray) { $net_by_date{$date}[$i+$numl2]=join ',',$net_by_date{$date}[$i+$ +numl2],$netarray[$i]; } }

Here's the problem. I keep thinking that there's *got* to be a way to extract most of this mess into a nice neat subroutine, but the different logic needed to handle different lines keeps tripping me up. I.e. I'm interested in everything for @diskline, fields 2 and 3 from @memline, fields 3-7 from @pageline (which get stuffed into the same array as data from @memline)... etcetera.

Any suggestions would be greatly appreciated...

(Yes, I know I'm trying to parse NMon output. Yes, I know about Nmon Analyzer. No, it won't work for me in this situation.)

Replies are listed 'Best First'.
Re: Extracting code into a subroutine
by tobyink (Canon) on Oct 30, 2012 at 21:54 UTC

    It seems you could factor out some repetition, yes. Given this...

    my @diskline=`grep DISKBUSY $nmondir/$files_by_date{$date} | grep -v B +usy`; foreach my $line (@diskline) { chomp ($line); my @diskarray=split /,/, $line; shift @diskarray; shift @diskarray; for my $i (0..$#diskarray) { $disks_by_date{$date}[$i]=join ',', $disks_by_date{$date}[$i],$d +iskarray[$i] } } my @memline=`grep ^MEM, $nmondir/$files_by_date{$date} | grep -v Memor +y`; foreach my $line (@memline) { chomp ($line); my @memarray=split /,/, $line; my $rused=100-$memarray[2]; my $mused=100-$memarray[3]; $mem_by_date{$date}[0]=join ',', $mem_by_date{$date}[0],$rused; $mem_by_date{$date}[1]=join ',', $mem_by_date{$date}[1],$mused; }

    You could factor out the execution of the command, looping through lines, chomping, splitting and shifting off fields like this...

    sub process (&$;$) { my ($code, $cmd, $skipfields) = @_; my @lines = `$cmd`; for (@lines) { chomp; my @array = split /,/; if ($skipfields) { shift @array for 1..$skipfields; } local $_ = \@array; $code->(); } } process { for my $i (0 .. $#$_) { $disks_by_date{$date}[$i] = join( q[,], $disks_by_date{$date}[$i], $_->[$i], ); } } q{grep DISKBUSY $nmondir/$files_by_date{$date} | grep -v Busy}, 2; process { for my $i (0 .. 1) { my $used = 100 - $_->[2 + $i]; $mem_by_date{$date}[$i] = join( q[,], $mem_by_date{$date}[$i], $used ); } } q{grep ^MEM, $nmondir/$files_by_date{$date} | grep -v Memory};

    Actually, maybe we can go further and factor out those joins...

    sub joiny { my ($ref, $thing) = @_; $$ref = join q[,], $$ref, $thing; } sub process (&$;$) { my ($code, $cmd, $skipfields) = @_; my @lines = `$cmd`; for (@lines) { chomp; my @array = split /,/; if ($skipfields) { shift @array for 1..$skipfields; } local $_ = \@array; $code->(); } } process { for my $i (0 .. $#$_) { joiny \($disks_by_date{$date}[$i]), $_->[$i]; } } q{grep DISKBUSY $nmondir/$files_by_date{$date} | grep -v Busy}, 2; process { for my $i (0 .. 1) { my $used = 100 - $_->[2 + $i]; joiny \($mem_by_date{$date}[$i]), $used; } } q{grep ^MEM, $nmondir/$files_by_date{$date} | grep -v Memory};

    None of this is tested, as I don't have your source files to test against.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Extracting code into a subroutine
by space_monk (Chaplain) on Oct 31, 2012 at 06:49 UTC

    My first thought on reading this is why are you writing lots of this stuff when Text::CSV and similar libraries will probably do all this for you, in a standard maintainable manner?

    It seems you're committing the classic mistake of reinventing the wheel.....

Re: Extracting code into a subroutine
by grizzley (Chaplain) on Oct 31, 2012 at 09:59 UTC
    First write differences down. Then design subroutine arguments. Then desing subroutine. What are the differences?
    • List of field indexes.
    • grep regexp (NETPACKET, NETERROR)
    • grep regexp (Network, Busy, Memory)
    • what to do with arguments?
    • others?
    Then subroutine arguments:
    sub yoursub { my ($rangearrayref, $firstre, $secondre, $subref) = @_; ... } yoursub([2..7], 'NETPACKET', 'Network', \&transformNetpacketSub);
    or hash style:
    sub yoursub { my %args = @_; my ($rangearrayref, $firstre, $secondre, $subref) = ($args{-range}, +$args{-firstre}, $args{-secondre}, $args{-transformsub}); ... } yoursub(-range => [2..7], -firstre=>'NETPACKET', -secondre=>'Network', + ->transformsub=>\&transformNetpacketSub);
    And then the only problem is to write proper transformNetpacketSub. I would do it like this:
    sub transformNetpacketSub { my $str = shift; # do anything with string return $str; }
      Thanks to all who responded. It definitely gave me some ideas on how to proceed, which was all I needed.