Efficient deletion of files / shell interaction

madd has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I have a directory with several hundred sets of files (an input, data and output file). The files all begin with a given prefix and have a 4 digit sequential number. i.e.

Kick0000.inp
Kick0000.data
Kick0000.log
[download]

are the input, data and output files respectively. I wish to delete the data files of jobs that did not terminate normally (i.e. jobs where the output does not end in "Normal termination"). The following script, using backticks works, but I'm not sure I've done it in the best way. Mainly, I'm concerned about using backticks twice each iteration. I'd appreciate any advice on whether the backticks really are a problem, and if so, how I could better write this script. Thanks in advance, Madd

#!/usr/bin/perl -s

$prefix="Kick";

$Restart="Restart.data";

open (RESTART,"$Restart") or die "Unable to open $Restart for reading\
+n";
$AlreadyDone = <RESTART>;
$jobs_run= sprintf '%04d',($AlreadyDone);

for (my $j=0;$j<=$jobs_run;$j++)
{
   $job_no=sprintf '%04d',($j);
   $job_title="$prefix$job_no";
   $job_end=`tail -n 1 $job_title.log`;
   if($job_end !~ /Normal/)
   {
      `rm $job_title.data`;
   }
}
[download]

Comment on Efficient deletion of files / shell interaction Select or Download Code

Replies are listed 'Best First'.
Re: Efficient deletion of files / shell interaction by Corion (Patriarch) on Jul 19, 2009 at 16:06 UTC
Instead of `rm`, do you maybe want the unlink function? Note that this function won't ask/warn you if you're deleting files that don't have the appropriate attributes set. Also, you might or might not want to read the files yourself instead of using `tail`, by using seek and the diamond operator.	[reply] [d/l] [select]
Re^2: Efficient deletion of files / shell interaction by madd (Acolyte) on Jul 19, 2009 at 16:34 UTC
Thanks, unlink works fine. I'll benchmark both approaches for my own edification later. However I'm not sure how to make the best use of seek and readline given that the log files are of varying lengths (~100kB - 10MB) and I'm always after just the last line. It seems a bit too much work to open, read and close several hundred files.	[reply]
Re^3: Efficient deletion of files / shell interaction by Your Mother (Archbishop) on Jul 19, 2009 at 17:17 UTC
In addition to Corion's advice there is File::ReadBackwards. I don't know if it would be faster than seek but it would probably be easier to code up since it's line instead of block-size oriented.	[reply]
Re^3: Efficient deletion of files / shell interaction by Corion (Patriarch) on Jul 19, 2009 at 16:49 UTC
seek allows you to seek to (just before) the end of the file and to read the last (say) 1024 bytes of the file, and then look there for what you're searching for. Which is about what `tail` does, too.	[reply] [d/l]
Re^3: Efficient deletion of files / shell interaction by graff (Chancellor) on Jul 19, 2009 at 20:30 UTC
The trick about using seek and read is that in order to seek to a position from the end of a file, you have to specify a negative number for the offset amount. For example, if the log files for successful runs always have the phrase "Normal termination\n" as the very last thing in each file, that's just 19 bytes you need to read from the end -- but let's pad that a bit, just to be safe: #!/usr/bin/perl use strict; my $prefix="Kick"; my $Restart="Restart.data"; open( RESTART, $Restart ) or die "Unable to read $Restart: $!\n"; my $AlreadyDone = <RESTART>; my ( $jobs_run ) = ( $AlreadyDone =~ /(\d+)/ ); for ( my $j=0; $j<=$jobs_run; $j++) { my $job_title = sprintf( "%s%04d", $prefix, $j ); if ( open( my $fh, "<", "$job_title.log" )) { seek( $fh, -24, 2 ); read( $fh, my $job_end, 24 ); unlink "$job_title.data" unless ($job_end =~ /Normal/); } else { warn "Unable to read $job_title.log: $!\n"; } } [download] Some miscellaneous notes: I didn't see any clear rationale for using the "-s" option on the shebang line. If you have a reason for that in your "production" script, it's fine, but it seemed unnecessary here. Your method of getting a numeric value from the "Restart.data" file was strange. I think a regex match for the numeric is better/safer. When you decide to report an error message regarding a failed `open()` call, including "$!" in the message can be very helpful. Overall, I think the overhead of opening and closing data files in perl will be less (given that you are only reading the last couple dozen bytes from each file), than the OS overhead of creating and tearing down 100's of subshells for running "tail", just as the use of perl's unlink function is fairly certain to be more efficient than a series of subshells that invoke the unix "rm" command. Notice how easy it was to add `use strict` (updated to add a missing close-paren -- forgot to do "perl -cw" before hitting "create")	[reply] [d/l] [select]
Re: Efficient deletion of files / shell interaction by kyle (Abbot) on Jul 19, 2009 at 17:55 UTC
I might write this, if I didn't care if anyone can read it. my @job_nums = 0 .. $jobs_run; my $logs = join q{ }, map { sprintf "$prefix%04d.log" } @job_nums; my $tail_rx = qr{ ==> (.+?) <== \n ( [^\n]* ) \n }xms; my @tail_pairs = `tail -n 1 $logs` =~ $tail_rx; unlink map { s/\.log$/.data/; $_ } map { $tail_pairs[$_2] } grep { $tail_pairs[1+$_2] !~ /Normal/ } @job_nums; [download] Note that I haven't tried this, and given that it calls unlink, one would do well to test it with a copy of real data before trying it with real data. This has the advantages that it shells out to run `tail` only once on the list of every log file and that it calls unlink only once with the list of the files to delete. On the other hand, the heavy use of map and grep means it does more looping than is absolutely necessary.	[reply] [d/l] [select]
Re: Efficient deletion of files / shell interaction by JavaFan (Canon) on Jul 19, 2009 at 18:58 UTC
`tail -1 ... list of file ... \|\ perl -nE 'given($_) {when (/==> (.) <==/) {$file = $1} when (/^\s$/} + {;} when {/Normal/} default {unlink $file or die}'` [download] This assumes you don't have files ending in '==> foo <=='.	[reply] [d/l]
Re: Efficient deletion of files / shell interaction by psini (Deacon) on Jul 19, 2009 at 16:09 UTC
unlink? Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks