http://www.perlmonks.org?node_id=142909


in reply to New Perl User Question

I know this code is clunky, and probably a lot longer than it needs to be, suggestions on shortening it would also be appreciated.

Careful what you wish for:

Take a look at perlstyle. It is included with the standard perl distribution. One style rule mentions the use of the underscore _ to separate the words in variable names. This makes reading variables faster and easier especially for non-native speakers of English. I also prefer to avoid mixed case variables and subroutine names to avoid miss-typing.

Other style rules mentioned in perlstyle include: always use spaces around operators, use 4-spaces as tabs, and add a space after commas. These rules are not set in concrete. The best thing is find your style, compare it with that of others and then be consistent.

Keeping this in mind, We can apply BazB's advice:
my $process_file_name = shift @ARGV;
And reducing the comments some:
# file name for corrected data my $clean_file_name = "$processfilename.clean"; # numeric value of the "usual" line starting character my $correct_start_char = 16; # "usual" starting length of lines starting with $correct_start_char my $correct_start_length = 16; # correct length of lines after they have been stripped my $correct_clean_length = 13; # length of lines that do not include the extra stop and start charact +ers my $typed_length = 14; # Do not change these values, they are used to # report the number of lines read, and written my $raw_file_length = 0; my $clean_file_length = 0;

You might take alook at perlsub. It is also included with the standard perl distribution and gives some excellent reasons for not prefixing subroutine calls with the ampersand &. One more rule you might adopt is to avoid using global values in subroutines. When possible attempt to pass all the needed variables to a sub and return expected values from the sub. This makes the sub more portable and the program easier to maintain.

With that in mind, we could change:
&ProcessFile; to:
my($raw_file_length, $clean_file_length) = process_file( $process_file_name, $clean_file_name, $correct_start_char, $correct_start_length, $correct_clean_length, $typed_length);
All this might be shortened, without a loss of clarity with an array:
my @args = ( $ARGV[0], # file to process "$ARGV[0].clean", # processed file 16, # numeric value of the "usual" # line starting character 16, # "usual" starting length of lines # starting with $correct_start_char 13, # correct length of lines after they # have been stripped 14 # length of lines that do not include # the extra stop and start characters ); my($raw_file_length, $clean_file_length) = process_file( @args );

NOTE: We could pass @args with a reference, but I'm not willing to add too many new concepts at one time. perlref and perlsub have more information.

Before we examine the subroutine, let's add the end of the program:
print "$raw_file_length lines read from $args[0]\n"; print "$clean_file_length lines written to $args[1]\n"; print "\a"; exit(0);

For the subroutine, we'll examine the logic of the if/else and we'll use the substr function in place of the chomp/chop method you used.

chomp $data; chop $data; $data = reverse ($data); chop $data; $data = reverse ($data);
Could be replaced with:
$data = substr($data, 1, -2); And:
chomp $data; $datalengthtrack--; chop $data; $datalengthtrack--; $data = reverse ($data); while ($datalengthtrack > $correctcleanlength){ chop $data; $datalengthtrack--; } $data = reverse ($data); print CLEANFILE "$data\n";
Could be replaced with:
print CLEANFILE substr( substr($data, 0, -2), -$correct_clean_length), "\n"; The if/else block might be re-written as:
if ( $start_char == $correct_start_char ) { next if $data_length != $correct_start_length; if ( $data_length == $correct_start_length ) { print CLEANFILE substr($data, 1, -2), "\n"; $clean_file_length++; } } else { if ( $data_length > $correct_clean_length ) { print CLEANFILE substr( substr($data, 0, -2), -$correct_clean_ +length), "\n"; $clean_file_length++; } elsif ( $data_length == $typed_length ) { print CLEANFILE $data; $clean_file_length++; } }
We'll have to retrieve those variables we passed to the sub:
my ( $file_name, $clean_file_name, $correct_start_char, $correct_start_length, $correct_clean_length, $typed_length) = + @_;
And I took the liberty of shortening the file handle names:
open RAW, $file_name or die "Cannot open $file_name: $!"; open CLEAN, ">$clean_file_name" or die "Cannot open $clean_file_name: +$!";
And since we were actually keeping track of lines not file lengths, I used:
my($raw_lines, $clean_lines); in the sub, instead of:
my($raw_file_length, $clean_file_length) Putting it all together:
#!/usr/bin/perl #################################################### # InvClean.pl : Raw Scanned File Processing program # Version 1.0 # Written by Robb Pickinpaugh # 01/31/2002 # for use on Windows NT #################################################### use strict; use warnings; my @args = ( $ARGV[0], # file to process "$ARGV[0].clean", # processed file 16, # numeric value of the "usual" # line starting character 16, # "usual" starting length of lines # starting with $correct_start_char 13, # correct length of lines after they # have been stripped 14 # length of lines that do not include # the extra stop and start characters ); my($raw_file_length, $clean_file_length) = process_file( @args ); print "$raw_file_length lines read from $args[0]\n"; print "$clean_file_length lines written to $args[1]\n"; print "\a"; exit(0); sub process_file { my ( $file_name, $clean_file_name, $correct_start_char, $correct_start_length, $correct_clean_length, $typed_length) = + @_; open RAW, $file_name or die "Cannot open $file_name: $! +"; open CLEAN, ">$clean_file_name" or die "Cannot open $clean_file_na +me: $!"; my ($raw_lines, $clean_lines); while ( my $line = <RAW> ) { my $line_length = length $line; my $start_char = ord $line; if ( $start_char == $correct_start_char ) { next if $line_length != $correct_start_length; if ( $line_length == $correct_start_length ) { print CLEAN substr($line, 1, -2), "\n"; $clean_lines++; } } else { if ( $line_length > $correct_clean_length ) { print CLEAN substr( substr($line, 0, -2), -$correct_cl +ean_length), "\n"; $clean_lines++; } elsif ( $line_length == $typed_length ) { print CLEAN $line; $clean_lines++; } } $raw_lines = $.; } close RAW or die "cannot close $file_name: $!"; close CLEAN or die "cannot close $clean_file_name: $!"; return ($raw_lines, $clean_lines); } __END__



HTH,
Charles K. Clarkson
HTH, Charles K. Clarkson Clarkson Energy Homes, Inc.

Replies are listed 'Best First'.
Re: Re: New Perl User Question
by Rpick (Novice) on Feb 04, 2002 at 22:18 UTC
    Charles,

    Thanks for the warning on being careful what I wish.

    Actually what you showed me was exactly what I was looking for.
    My programming background is in C++, and a bit of VB, and VBA.
    I'm just starting out in perl, and wasn't aware of functions like substr.

    That was exactly the kind of explanation I was looking for.

    Thanks again.