<?xml version="1.0" encoding="windows-1252"?>
<node id="142909" title="Re: New Perl User Question" created="2002-02-02 08:10:49" updated="2005-07-21 01:30:44">
<type id="11">
note</type>
<author id="85412">
CharlesClarkson</author>
<data>
<field name="doctext">
&lt;p&gt;&lt;em&gt;I know this code is clunky, and probably a lot longer than
it needs to be, suggestions on shortening it would also be appreciated.&lt;/em&gt;&lt;/p&gt;
&lt;h4&gt;Careful what you wish for:&lt;/h4&gt;
&lt;p&gt;Take a look at [perlstyle]. It is included with the standard perl
distribution. One style rule mentions the use of the underscore &lt;code&gt;_&lt;/code&gt;
to separate the words in variable names. This makes reading variables faster
and easier especially for non-native speakers of English. I also prefer to
avoid mixed case variables and subroutine names to avoid miss-typing.&lt;/p&gt;
&lt;p&gt; Other style rules mentioned in 
[perlstyle] include: always use spaces around operators, use 4-spaces
as tabs, and add a space after commas.
These rules are not set in concrete. The best thing is find &lt;em&gt;your&lt;/em&gt;
style, compare it with that of others and then be consistent.&lt;/p&gt;
Keeping this in mind, We can apply [id://127547]'s [id://142775|advice]:&lt;BR&gt;
&lt;code&gt;my $process_file_name = shift @ARGV;
&lt;/code&gt;
&lt;readmore&gt;
And reducing the comments some:&lt;br&gt;
&lt;code&gt;# file name for corrected data
my $clean_file_name = "$processfilename.clean";

# numeric value of the "usual" line starting character
my $correct_start_char = 16;

# "usual" starting length of lines starting with $correct_start_char
my $correct_start_length = 16;

# correct length of lines after they have been stripped
my $correct_clean_length = 13;

# length of lines that do not include the extra stop and start characters
my $typed_length = 14;

# Do not change these values, they are used to
# report the number of lines read, and written
my $raw_file_length     = 0;
my $clean_file_length   = 0;
&lt;/code&gt;
&lt;p&gt;You might take alook at [perlsub]. It is also included with the
standard perl distribution and gives some excellent reasons for not
prefixing subroutine calls with the ampersand &lt;code&gt;&amp;&lt;/code&gt;. One more
rule you might adopt is to avoid using global values in subroutines. When
possible attempt to pass all the needed variables to a sub and return
expected values from the sub. This makes the sub more portable and the
program easier to maintain.&lt;/p&gt;
With that in mind, we could change:&lt;br&gt;
&lt;code&gt;&amp;ProcessFile;
&lt;/code&gt;
to:
&lt;code&gt;my($raw_file_length, $clean_file_length) =
    process_file(
        $process_file_name, $clean_file_name, $correct_start_char,
        $correct_start_length, $correct_clean_length, $typed_length);
&lt;/code&gt;
All this might be shortened, without a loss of clarity with an array:&lt;br&gt;
&lt;code&gt;my @args = (
        $ARGV[0],           # file to process
    
        "$ARGV[0].clean",   # processed file
    
        16,                 # numeric value of the "usual"
                            # line starting character
    
        16,                 # "usual" starting length of lines
                            # starting with $correct_start_char
    
        13,                 # correct length of lines after they
                            # have been stripped
    
        14                  # length of lines that do not include
                            # the extra stop and start characters
    );


my($raw_file_length, $clean_file_length) = process_file( @args );
&lt;/code&gt;
&lt;p&gt;&lt;b&gt;NOTE:&lt;/b&gt; We could pass @args with a reference, but I'm not willing
to add too many new concepts at one time. [perlref] and [perlsub] have
more information.&lt;/p&gt;
Before we examine the subroutine, let's add the end of the program:&lt;br&gt;
&lt;code&gt;print "$raw_file_length lines read from $args[0]\n";
print "$clean_file_length lines written to $args[1]\n";

print "\a";
exit(0);
&lt;/code&gt;

&lt;p&gt;For the subroutine, we'll examine the logic of the &lt;code&gt;if/else&lt;/code&gt;
and we'll use the &lt;code&gt;substr&lt;/code&gt; function in place of the
&lt;code&gt;chomp/chop&lt;/code&gt; method you used.&lt;/p&gt;
&lt;code&gt;chomp $data; 
chop $data; 
$data = reverse ($data); 
chop $data; 
$data = reverse ($data); 
&lt;/code&gt;
Could be replaced with:&lt;br&gt;
&lt;code&gt;$data = substr($data, 1, -2);
&lt;/code&gt;
And:&lt;br&gt;
&lt;code&gt;chomp $data; 
$datalengthtrack--; 
chop $data; 
$datalengthtrack--; 
$data = reverse ($data); 
while ($datalengthtrack &gt; $correctcleanlength){ 
    chop $data; 
    $datalengthtrack--; 
} 
$data = reverse ($data); 
print CLEANFILE "$data\n";
&lt;/code&gt;
Could be replaced with:&lt;br&gt;
&lt;code&gt;print CLEANFILE substr( substr($data, 0, -2), -$correct_clean_length), "\n";
&lt;/code&gt;
The &lt;code&gt;if/else&lt;/code&gt; block might be re-written as:&lt;br&gt;
&lt;code&gt;if ( $start_char == $correct_start_char ) {
    next if $data_length != $correct_start_length;
    if ( $data_length == $correct_start_length ) {
        print CLEANFILE substr($data, 1, -2), "\n";
        $clean_file_length++;
    }
} else {
    if ( $data_length &gt; $correct_clean_length ) {
        print CLEANFILE substr( substr($data, 0, -2), -$correct_clean_length), "\n";
        $clean_file_length++;
    } elsif ( $data_length == $typed_length ) {
        print CLEANFILE $data;
        $clean_file_length++;
    }
}
&lt;/code&gt;
We'll have to retrieve those variables we passed to the sub:&lt;br&gt;
&lt;code&gt;my (  $file_name, $clean_file_name, $correct_start_char,
        $correct_start_length, $correct_clean_length, $typed_length) = @_;
&lt;/code&gt;
And I took the liberty of shortening the file handle names:&lt;br&gt;
&lt;code&gt;open RAW,   $file_name          or die "Cannot open $file_name: $!";
open CLEAN, "&gt;$clean_file_name" or die "Cannot open $clean_file_name: $!";
&lt;/code&gt;
And since we were actually keeping track of lines not file lengths, I used:&lt;br&gt;
&lt;code&gt;my($raw_lines, $clean_lines);
&lt;/code&gt;
in the sub, instead of:&lt;br&gt;
&lt;code&gt;my($raw_file_length, $clean_file_length)
&lt;/code&gt;

Putting it all together:
&lt;code&gt;#!/usr/bin/perl
####################################################
# InvClean.pl : Raw Scanned File Processing program
# Version 1.0
# Written by Robb Pickinpaugh
# 01/31/2002
# for use on Windows NT
####################################################
use strict;
use warnings;

my @args = (
        $ARGV[0],           # file to process

        "$ARGV[0].clean",   # processed file

        16,                 # numeric value of the "usual"
                            # line starting character

        16,                 # "usual" starting length of lines
                            # starting with $correct_start_char

        13,                 # correct length of lines after they
                            # have been stripped

        14                  # length of lines that do not include
                            # the extra stop and start characters
    );


my($raw_file_length, $clean_file_length) = process_file( @args );

print "$raw_file_length lines read from $args[0]\n";
print "$clean_file_length lines written to $args[1]\n";

print "\a";
exit(0);

sub process_file {
    my (
        $file_name, $clean_file_name, $correct_start_char,
        $correct_start_length, $correct_clean_length, $typed_length) = @_;

    open RAW,   $file_name          or die "Cannot open $file_name: $!";
    open CLEAN, "&gt;$clean_file_name" or die "Cannot open $clean_file_name: $!";

    my ($raw_lines, $clean_lines);
    while ( my $line = &lt;RAW&gt; ) {
        my $line_length = length $line;
        my $start_char = ord $line;
        if ( $start_char == $correct_start_char ) {
            next if $line_length != $correct_start_length;
            if ( $line_length == $correct_start_length ) {
                print CLEAN substr($line, 1, -2), "\n";
                $clean_lines++;
            }
        } else {
            if ( $line_length &gt; $correct_clean_length ) {
                print CLEAN substr( substr($line, 0, -2), -$correct_clean_length), "\n";
                $clean_lines++;
            } elsif ( $line_length == $typed_length ) {
                print CLEAN $line;
                $clean_lines++;
            }
        }
        $raw_lines = $.;
    }

    close RAW   or die "cannot close $file_name: $!";
    close CLEAN or die "cannot close $clean_file_name: $!";

    return ($raw_lines, $clean_lines);
}

__END__
&lt;/code&gt;
&lt;BR&gt;&lt;BR&gt;&lt;BR&gt;
HTH,&lt;BR&gt;
Charles K. Clarkson&lt;BR&gt;

HTH,
Charles K. Clarkson
Clarkson Energy Homes, Inc.

</field>
<field name="root_node">
142769</field>
<field name="parent_node">
142769</field>
</data>
</node>
