Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Pass the arguments to the procedure in an easier way

by Scottie (Novice)
on Jun 16, 2011 at 19:52 UTC ( [id://910017]=perlquestion: print w/replies, xml ) Need Help??

Scottie has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I have a procedure whose task is to count checksums for files that are a input argument in subroutine. Argument can be:
- All files ('*') in a directory ($DIR).
- Files with the extensions, for example: '*.foo *.bar *.pl *.pm'
- Files with names beginning, for example: 'FOO* bar*'

--------------8<-------------- sub md5files { use File::Basename; my $dir = $_[0]; # where the files are (must be a di +rect path from root /) my $what = $_[1] || '*'; # what files, default = '*' (all fi +les) my $md5file = "$dir/checksums.md5"; my %hash_md5; my @t = split(/ /, $what); open (MD5OUT, '>', $md5file ) or die "Err: $!"; foreach my $f (@t) { foreach my $file2md5 (<$dir/$f>) { next if "$file2md5" eq "$md5file"; my $md5 = &md5calc($file2md5); if($md5 ne "") { $file2md5 =~ s/$dir\///; print MD5OUT $md5 . " " . $file2md5 . "\n"; $hash_md5{$md5} = $file2md5; } else { return ""; } } } close MD5OUT; return %hash_md5; } --------------8<--------------

Above sub can be called as follows:
my $DIR = '/tmp';
my %md5hash = &md5files($DIR);
my %md5hash = &md5files($DIR,'FOO* bar*');
my %md5hash = &md5files($DIR,'*.foo *.bar *.pl *.pm');

Is it possible to simplify my procedure? Could it be more perlish?

Thanks in advance.

Best regards,
--
Scottie

Replies are listed 'Best First'.
Re: Pass the arguments to the procedure in an easier way
by davido (Cardinal) on Jun 16, 2011 at 22:44 UTC

    One issue is that you're returning on a rejected file before finishing the list of files, and doing so without closing the filehandle. Since it's a bareword filehandle, it won't close automatically either. Use lexical filehandles, and don't return (just move onto the next file) in case of a rejected file.

    Some of the logic for rejecting files can be moved up out of the loop by grepping to obtain only those files you care about to begin with. It's a big "who cares" with respect to efficiency (introducing the grep loop while moving tests out of the inner loop) because it's doubtful that you're dealing with enough files for the loops to bog down. I just liked the clarity of eliminating rejects from the list being looped over, rather than dealing with them inside the loop. Obviously I did some checking inside the loop, where it would have impeded clarity to move it into the grep.

    I switched you over to lexical filehandles which will close themselves if they pass out of scope. But because it's an outfile you're dealing with I still retained a close, so that I could test for failure. You could have used the autodie pragma instead though.

    Once I started moving things around it seemed like some of the complexity cleared itself up. I don't think I altered your functionality, though I haven't tested. You would want to give it the solid run-through before depending on it not altering functionality. One thing that did change, is it will not return "" on failure. Instead, the hash will be empty.

    Just look over the code. It should make sense. I introduced another checks for you: -f (are we dealing with a file?).

    sub md5files { my $dir = shift; # where the files are (must be a # direct path from root /) my $what = shift || '*'; # what files, default = '*' my $md5file = shift || "$dir/checksums.md5"; # default. my %hash_md5; my @t = split(/ /, $what); open (my $md5_outfh, '>', $md5file ) or die "Err: $!"; foreach my $f ( @t ) { foreach my $file2md5 ( grep { -f and $_ ne $md5file } <$dir/$f> ) { next unless my $md5 = md5calc( $file2md5 ); my ( $filename ) = $file2md5 =~ m{/([^/]+)$}; say $md5_outfh "$md5\t$filename"; $hash_md5{$md5} = $filename; } } close $md5_outfh; return %hash_md5 or die $!; # Consider 'use autodie;' }

    Oh, also I couldn't see where you were using any functionality from File::Basename, so for the time being I commented it out (and removed it from within the sub).


    Dave

Re: Pass the arguments to the procedure in an easier way
by 7stud (Deacon) on Jun 16, 2011 at 21:17 UTC
    Is it possible to simplify my procedure? Could it be more perlish?

    1) Get rid of garbage like $_[0], $_1. Instead use shift():

    sub do_stuff { my $dir = shift; my $pattern = shift; ...

    ... or list assignment if you want to keep @_ intact:

    my ($dir, $pattern) = @_;

    Do either of those as the *first* line(s) of your sub.

    2) In general, your variable names are atrocious:

    $what --> $pattern_str @t --> @patterns $f --> $pattern $file2md5 --> $fname

    3) As an alternative to split() and looping, you could chdir($dir) and then glob($pattern_str).

    4) What do you think the quote marks do here:

    "$file2md5"

    5) Don't use bareword filehandles. Use a my() variable instead(which you can capitalize if you want to):

    my $fname_out = 'name.txt'; open my $OUTFILE, '>', $fname_out or die "Couldn't open $fname_out: $!"; #and then... print $OUTFILE "whatever";
Re: Pass the arguments to the procedure in an easier way
by Marshall (Canon) on Jun 16, 2011 at 22:11 UTC
    Some more comments for you:

    I would move "use File::Basename;" outside of the subroutine and put it near the top of the module or program. I don't think the effect of the "use" is local to just the subroutine and calling it repeatedly doesn't make sense. This is not like C where function declarations would go at beginning of the sub. use causes the code of the module to be executed. Of course since you didn't even use this module in the code, you could just delete this statement.

    If I have a single arg, or in an object to get the object ref, I use shift. Otherwise I use: my ($dir, $what) = @_;. In Perl 5.10 you can use the new //= operator, which tests for "definedness", $what //= '*';.

    I would rethink the error conditions. You return a null hash if any of the mdcalc's produce a null string result, but its not clear that any sort of warning or error output will be available to see which file that happened upon. Some sort of croak, die, warn or whatever may be appropriate there.

    The & is no longer needed in front of function calls. just: md5calc($file2md5) will suffice.

    Adding extra concatenation operators in the print is unnecessary. print MD5OUT $md5 . " " . $file2md5 . "\n"; could be print MD5OUT "$md5 $file2md5\n";.

    $file2md5 =~ s/$dir\///; this is just $f.

    A list of files like "a b c" in a single string is a weird way of doing that. The caller probably has these file names in an array, so use an array or ref to an array to pass these names, eliminating this split() stuff.

      $file2md5 =~ s/$dir\///; this is just $f.

      Also, you can use another delimiter for s/// so that you don't have to escape internal characters, e.g.

      s{$dir/}{}

      When you escape characters in a pattern, it makes the pattern hard to read.

      You also have to be aware that any characters in your dir name that are regex metacharacters, like a dot, braces, etc. need to be escaped to make perl recognize them as the literal characters rather than special regex characters. You can use quotemeta() on the string, which will escape any regex characters.

Re: Pass the arguments to the procedure in an easier way
by dbs (Sexton) on Jun 16, 2011 at 20:23 UTC
    I am no Perl guru, but I do know it is good practice to use @_ to pass args to a sub and not $_[0] 1 etc. Looks pretty Perl-ish to me. You could use FileHandle...I like it better:
    use FileHandle; my $RLOG = new FileHandle "+>> $runlog" or die $!;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://910017]
Approved by Marshall
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-25 14:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found