My first perl script is working, what did I do wrong?

killersquirel11 has asked for the wisdom of the Perl Monks concerning the following question:

Hey all,

I've hacked a few perl scripts in my day, but this is the first time I've ever written one from scratch. Thus I want to get some feedback on the good, the bad, and the ugly of what I've done here

Essentially, the script's job is to parse a file containing a column of part names and several columns of description (the columns are fixed-width space-delimited), and output how many parts match each description. Below is the code:

#!/usr/bin/perl -w

use strict;
use warnings;

print "ARGV: $#ARGV\n";

if ($#ARGV != 0) {
     print "One, and only one (not $#ARGV), command line parameter is 
+expected\n";
     exit;
}
my $inputFile=$ARGV[0];

if ( not -e $inputFile) {
     die "$inputFile doesn't exist!!!\nExiting.\n";
}

print "Attempting to open $inputFile\n";

open my $file, $inputFile or die "Could not open $inputFile: $!";
my $headerLine= -1;
my $prtIndex  = -1;
my $shpIndex  = -1;
my $clrIndex  = -1;
my $sizIndex  = -1;
my %values    = ();
my $index     =  0;
my @parts     = ();
while( my $line = <$file>) {
    if ( $line =~ "Part *Shape *Color *Size" ){
        if($headerLine < 0){
            $headerLine=$.;
            $prtIndex=index $line, "Part";
            $shpIndex=index $line, "Shape";
            $clrIndex=index $line, "Color";
            $sizIndex=index $line, "Size";
        }
        next
    }
    if(not checkIndices($prtIndex, $shpIndex, $clrIndex, $sizIndex)){
        $headerLine=-1;
        next;
    }
    if(length($line)<$sizIndex){
        next;
    }
    my $prt = substr $line, $prtIndex, $shpIndex-1-$prtIndex;
    my $shp = substr $line, $shpIndex, $clrIndex-1-$shpIndex;
    my $clr = substr $line, $clrIndex, $sizIndex-1-$clrIndex;
    my $siz = substr $line, $sizIndex;
    trimall($prt,$shp,$clr,$siz);
    my $superstr=$shp.$clr.$siz;
    if(not exists $values{$superstr}){
        $values{$superstr}=$index;
        $parts[$index][0][0]=$prt;    #List of all parts
        $parts[$index][1]=$superstr;  #Value of the parts
        $parts[$index][2]=1;          #Total qty of these parts
        $index++;
    } else {
        my $i=$values{$superstr};
        $parts[$i][0][$parts[$i][2]] = $prt;
        $parts[$i][2]++;
    }
}

for my $i (0..$index-1){
    my @plist=$parts[$i][0];
    my $value=$parts[$i][1];
    my $count=$parts[$i][2];
    for my $j (0..$parts[$i][2]-1){
        printf $parts[$i][0][$j]." ";
    }   printf "- ";
    print $value." - ";
    print $count.".\n";
}
exit;

# Check if the indices for a number of input variables are all >= 0
sub checkIndices {
    my $count;
    for ($count=0; $count<=$#_; $count++){
        if($_[$count] < 0){
            return (1==0);
        }
    }
    return (1==1);
}

# Remove leading and trailing whitespace from all arguments
sub trimall {
    my $count;
    for ($count=0; $count<=$#_; $count++){
        $_[$count] =~ s/^\s*//; $_[$count] =~ s/\s*$//;
    }
    return;
}
[download]

And here is the test input file

Part     Shape      Color   Size
P1       Circle     Red     1   
P2       Square     Green   3   
P3       Rectangle  Red     4   
P4       Circle     Red     1   
P5       Square     Blue    1   
P6       Square     Green   3   
P7       Rectangle  Blue    4   
P8       Square     Red     2   
P9       Circle     Green   3   
P10      Circle     Blue    4
[download]

When all is said and done, it should output that P1 and P4 are equal, as are P2 and P6, which it does:

ARGV: 0
Attempting to open testFile.txt
P1 P4 - CircleRed1 - 2.
P2 P6 - SquareGreen3 - 2.
P3 - RectangleRed4 - 1.
P5 - SquareBlue1 - 1.
P7 - RectangleBlue4 - 1.
P8 - SquareRed2 - 1.
P9 - CircleGreen3 - 1.
P10 - CircleBlue4 - 1.
[download]

So now, I ask of ye, oh keepers of the ~~Scrolls~~ Scripts, what you would change about my work?

Thanks in advance!

Comment on My first perl script is working, what did I do wrong? Select or Download Code

Replies are listed 'Best First'.
Re: My first perl script is working, what did I do wrong? by toolic (Bishop) on Nov 08, 2012 at 19:52 UTC
If your part numbers will never have spaces in them, you could split your lines and build up a hash of arrays: use strict; use warnings; print "ARGV: $#ARGV\n"; if ($#ARGV != 0) { print "One, and only one (not $#ARGV), command line parameter is +expected\n"; exit; } my $inputFile=$ARGV[0]; if ( not -e $inputFile) { die "$inputFile doesn't exist!!!\nExiting.\n"; } print "Attempting to open $inputFile\n"; open my $file, $inputFile or die "Could not open $inputFile: $!"; my %parts; while (<$file>) { next if /^Part/; my ($part, $info) = split /\s+/, $_, 2; $info =~ s/\s//g; push @{ $parts{$info} }, $part; } for my $info (sort keys %parts) { print "@{ $parts{$info} } - $info - ", scalar @{ $parts{$info} }, +"\n"; } __END__ ARGV: 0 Attempting to open 1.txt P10 - CircleBlue4 - 1 P9 - CircleGreen3 - 1 P1 P4 - CircleRed1 - 2 P7 - RectangleBlue4 - 1 P3 - RectangleRed4 - 1 P5 - SquareBlue1 - 1 P2 P6 - SquareGreen3 - 2 P8 - SquareRed2 - 1 [download] With a little more work, you could sort on hash values to get the output in the order you want.	[reply] [d/l]
Re^2: My first perl script is working, what did I do wrong? by BillKSmith (Monsignor) on Nov 08, 2012 at 23:58 UTC
If you can depend on the fixed columns, use unpack the same way as split. Bill	[reply]
Re^3: My first perl script is working, what did I do wrong? by killersquirel11 (Novice) on Nov 09, 2012 at 00:10 UTC
Yes, I can depend on fixed columns more so than I can on the contents being nicely-formatted. Thanks for the quick replies.	[reply]
Re^2: My first perl script is working, what did I do wrong? by Anonymous Monk on Nov 09, 2012 at 11:36 UTC
1 == @ARGV or die "One, and only one argument expected\n"; 1 == @ARGV or die Usage();	[reply]
Re: My first perl script is working, what did I do wrong? by rjt (Curate) on Nov 08, 2012 at 21:16 UTC
You have a lot of options in your approach, here. I humbly offer my somewhat more Perl-ish approach to this problem. The basis of what I did is to get rid of hard-coded fields and allow you to specify them near the top of the script in `@fields`, building a regexp to capture them in the header line. I relaxed the usage of the program as well, to allow for specifying a filename on the command line, or, if none is specified, accepting input from STDIN. You can easily change the output format by modifying the final `for (sort keys %inv) { ... }` loop, and you can change how records are grouped by changing the `key()` function to suit your tastes. In both cases, I tried to stay with what you had, since I don't know what you want to ultimately do with the data. use strict; use warnings; use 5.12.00; die "usage: $0 [filename]\n" if (@ARGV > 1); # Set required fields and determine fixed widths my @fields = qw/Part Shape Color Size/; my $re; $re .= qr/(?<$_>$_\s)/ for @fields; my $header = <>; die "Header must match " . join(' ', @fields) unless ($header =~ /^$re +$/); my $expected_len = length($header); my $tmpl = join(' ', map { "A[".length($+{$_})."]" } @fields); my %inv; # Inventory; $inv{key(%rec)} while (<>) { if (length != $expected_len) { die sprintf("Got length of %d, expecting %d", length, $expecte +d_len); } my @rec = map { /(.+?)\s$/ } unpack $tmpl; # Get (trimmed) record +s my %rec = map { $_ => shift @rec } @fields; # Add the part to our inventory push @{$inv{key(%rec)}}, $rec{Part}; } # Now print out the inventory in the desired format for (sort keys %inv) { say join(' ',@{$inv{$_}}) . " - $_ - " . scalar @{$inv{$_}}; } # Our custom hashing function for inventory items. Expects %rec argume +nt sub key { my %rec = @_; $rec{Shape} . $rec{Color} . $rec{Size}; } [download]	[reply] [d/l] [select]
Re^2: My first perl script is working, what did I do wrong? by killersquirel11 (Novice) on Nov 09, 2012 at 00:43 UTC
Thanks for the reply, it actually looks like Perl! A couple quick questions on this: 1) Are both '`use 5.12.00`' and '`use strict`' necessary? From what I read, '`use 5.12.00`' implies '`use strict`'. 2) With the following line, I'm having a bit of difficulty understanding it. `$re .= qr/(?<$_>$_\s)/ for @fields;` [download] Is it essentially this?: `for my $field (@fields) $re=$re.qr/(?<$_>$_\s)/` [download]	[reply] [d/l] [select]
Re^3: My first perl script is working, what did I do wrong? by rjt (Curate) on Nov 09, 2012 at 01:23 UTC
You are correct, `use strict` is not required here. Sorry, old habits die hard. :-) Not quite. Your example will not compile, but a similarly expanded translation of what I wrote would look like: `for my $field (@fields) { $re .= qr/(?<$field>$field\s*)/; }` [download] Read $_ for a description of the `$_` special variable, but the short version is, if you do not supply a variable name to for, Perl will automatically assume `$_`. The regex itself might benefit from a bit more explanation. It's capturing all of the column names including trailing whitespace, and saving those as named captures in `%+` for use in the line that builds `$tmpl`: `my $tmpl = join(' ', map { "A[".length($+{$_})."]" } @fields);` That generates an (un)pack template string based on the field lengths read from the header, determining the length of each field in `@fields` by taking the length of the same-named capture in `%+` . I'm using map here to "map" the values in `@fields` to the list that I want: a list of the lengths of each field.	[reply] [d/l] [select]
Re: My first perl script is working, what did I do wrong? by marquezc329 (Scribe) on Nov 08, 2012 at 22:20 UTC
Hello killersquirel11. Looks like you've already been given some full rewrites. I just figured I'd give you some smaller points to think about in the future. if() vs unless() i.e. `if ($#ARGV != 0)` -> `unless ($#ARGV == 0)` `if (not -e $inputFile)` -> `unless (-e $inputFile)` Multiple Variable Declaration can be broken down from: `my $headerLine= -1; my $prtIndex = -1; my $shpIndex = -1; my $clrIndex = -1; my $sizIndex = -1;` [download] to: ~~`my ($headerLine, $prtIndex, $shpIndex, $clrIndex, $sizIndex) = -1`~~ `my ($headerLine, $prtIndex, $shpIndex, $clrIndex, $sizIndex) = (-1) x 5;` C style for loops like: `sub checkIndices { my $count; for ($count=0; $count<=$#_; $count++){ if($_[$count] < 0){ return (1==0); } } return (1==1); }` [download] Can be "Perl-ified" like: `sub checkIndices { for my $item (@_){ return (1==0) if ($item < 0) } return (1==1); }` [download]	[reply] [d/l] [select]
Re^2: My first perl script is working, what did I do wrong? by rjt (Curate) on Nov 08, 2012 at 22:55 UTC
Multiple Variable Declaration can be broken down from: `my $headerLine= -1; my $prtIndex = -1; my $shpIndex = -1; my $clrIndex = -1; my $sizIndex = -1;` to: `my ($headerLine, $prtIndex, $shpIndex, $clrIndex, $sizIndex) = -1;` No, this won't do what you expect. Given: `my ($a, $b, $c) = -1;` only `$a` will receive the value (`$b` and `$c` will be `undef`). You can, however, use: `my ($a, $b, $c) = (-1, -1, -1);` That being said, I've found more often than not, when I have to start assigning the same value to a bunch of variables, there's probably a deeper design decision I need to question. In this case, -1 is used as some sort of magic value to indicate the state of something. Although even better refactoring can be done in this case, at the very least leaving the variables as `undef` provides better information, especially if any of those variables might legitimately contain -1 in the future.	[reply] [d/l] [select]
Re^3: My first perl script is working, what did I do wrong? by Anonymous Monk on Nov 08, 2012 at 23:48 UTC
Thanks for the replies. In case you couldn't tell, I typically program in C (and sometimes Java, although really when I use Java I treat it a lot like C). I will definitely keep these pointers in mind as I continue to learn how to hack together Perl scripts.	[reply]
Re^4: My first perl script is working, what did I do wrong? by killersquirel11 (Novice) on Nov 08, 2012 at 23:58 UTC
Re^3: My first perl script is working, what did I do wrong? by marquezc329 (Scribe) on Nov 09, 2012 at 00:29 UTC
Thank you for your correction rjt. Crackers2 also notified me of this mistake. I edited my response to reflect an alternate method of accomplishing this.	[reply]
Re: My first perl script is working, what did I do wrong? by eyepopslikeamosquito (Archbishop) on Nov 09, 2012 at 07:13 UTC
Some suggestions below on how to improve the early part of your script. 1) Replace: `if ($#ARGV != 0) { print "One, and only one (not $#ARGV), command line parameter is +expected\n"; exit; }` [download] with: `@ARGV == 1 or die "usage: $0 inputfile\n";` [download] When a script encounters an error it should exit with a non-zero value (a bald `exit` returns a zero exit value), a usage statement is conventional, and `or die` is idiomatic. BTW, you should rarely need to use the cumbersome `$#array` notation. Using it here is unnecessary and the code is clearer using `@ARGV` instead. Another example, from Perl Best Practices, chapter 5, "Use negative indices when counting from the end of an array" is to prefer `$frames[-2]` to `$frames[$#frames-1]` because it is easier on the eye and avoids the undesirable repetition of the variable name (DRY). 2) Replace: `my $inputFile=$ARGV[0];` [download] with: `my $inputFile = shift;` [download] Using `shift` makes maintenance easier if you later add or remove command line arguments because you don't need to manually shuffle the `ARGV` indices (which are "magic numbers"). 3) Replace: `if ( not -e $inputFile) { die "$inputFile doesn't exist!!!\nExiting.\n"; }` [download] with: `-f $inputFile or die "input file '$inputFile' does not exist.\n";` [download] Using `-f` is more precise than `-e` because the first argument must be a file (not a directory, say). Using `or die` is idiomatic and more concise. 4) Replace: `open my $file, $inputFile or die "Could not open $inputFile: $!";` [download] with: `open my $file, '<', $inputFile or die "Could not open $inputFile: $!";` [download] As for why the three-argument form of open is preferred, the old two-argument form of `open` is subject to various security exploits as described at Opening files securely in Perl.	[reply] [d/l] [select]
Re: My first perl script is working, what did I do wrong? by space_monk (Chaplain) on Nov 09, 2012 at 01:40 UTC
First, its a good idea to seek advice and review. I regularly write stuff that I think is good until one of my colleagues points out that he would have done it better/ faster/ in less code by doing it another way.... :-) Second, the main mistake you made is that you didn't ask yourself what tasks in the above script could be done by standard PERL libraries instead of lovingly hand crafting every aspect. There are lots of text parsing libraries, which if used, would have the benefit of substantially simplifying your code. In future, spend a few minutes before commencing any programming task, asking yourself what someone is likely to have done before, and what you can borrow from others to simplify your task and to make your code more understandable and maintainable. A quick browse of CPAN or PerlMonks before you program will often save someone dissecting where you went wrong afterwards. That said, good luck in the future! :-) A Monk aims to give answers to those who have none, and to learn from those who know more.	[reply]
Re: My first perl script is working, what did I do wrong? by tospo (Hermit) on Nov 09, 2012 at 09:31 UTC
++ for the title :-)	[reply]

Back to Seekers of Perl Wisdom