Re: sizeDateValidator.pl is horribly slow

In the first try, you are calling stat numerous times on each file, and that's wasting some amount of time. Call stat once per file, and save all its information for your various actions.

As for how long it should take to scan 20,000 files, what sort of time span are you expecting, and what sort of evidence (what sorts of processes) lead you to expect that?

There are some other trivial oddities in your first script -- I expect they don't affect the timing much (if at all), but they detract from the overall coherence of the code. Oh, and consistent indenting is useful...

Here's how I would do it:

use POSIX;

# Get argv handling out of the way first...
if ( @ARGV != 3 or ! -f $ARGV[0] ) {
    die "Usage:  perl $0  FileListToValidate OutFile StatusFile\n";
}

# Next take care of all the i/o file handling...
if ( -e $ARGV[2] ) {
    die "$ARGV[2] already exists -- I will not overwrite it\n";
}
open( STAT, '>', $ARG[2] ) or die "Can't write status info to $ARGV[2]
+: $!\n";

if ( ! open( OUT, '>', $ARGV[1] ) {
    print STAT "error: can't write output to $ARGV[1]: $!\n";
    exit;
}
if ( ! open( IN, '<', $ARGV[0] ) {
    print STAT "error: can't open $ARGV[0] for input $!\n";
    exit;
}

# Now get to work...
my @inpList = <IN>;
chomp @inpList;

for ( @inpList ) {   # let $_ hold the file name
    tr/"//d;  # get rid of double-quotes
    my @stats = stat;  # do this just once (works on $_ by default)
    if ( ! @stats ) {  # empty list means stat failed
        print OUT join( '|', $_, ( 'notfound' ) x 2 ), "\n";
    }
    else {
        print OUT join( '|', $_, $stats[7], 
            POSIX::strftime( "%m/%d/%Y %I:%M %p", localtime( $stats[9]
+ )), "\n";
    }
}
print STAT "success\n";
[download]

That eliminates a lot of useless variable creations and value assignments, but I think reducing the multiple stat calls per file to just one will be the thing that has a noticeable effect.

Personally, I'd go with just two command line args -- printing error messages (and even a "success" message) to stderr should suffice, so you just need the input list and the name to use for the output list (and you eliminate two possible causes of failure).

As for the second try, processing the output of some other command is bound to take longer (and can cause more trouble). Don't do that when a perl internal function can do the same thing.

Comment on Re: sizeDateValidator.pl is horribly slow Download Code


P is for Practical
	PerlMonks