RE (tilly) 2: List non-matching files

In general pure Perl solutions tend to be faster than find for all of the reasons that Perl usually beats shell scripting. (You don't have to keep on launching processes.) In this case it comes down to launching one rm and passing it a lot of filenames vs launching an rm per file. Guess which I think is faster?

However find has one huge advantage. It is one of the few ways to get around limitations with listing large numbers of files in shell scripts. The nom script given doesn't do that.

A second advantage is that while find has a more complex API, it is also more flexible... :-)

Comment on RE (tilly) 2: List non-matching files

Replies are listed 'Best First'.
I agree, honest. by gryng (Hermit) on Aug 20, 2000 at 22:34 UTC
`find . -type f -not -name '*.html' -maxdepth 1 -print \| xargs rm -rf;` The above only launches three processes (well, actually a few more if xargs decides there are too many files), and since it's I/O bound, I doubt a Perl based solution would be significantly faster (and personally, my wager is that it would be slower). However, I agree that shell scripting would be slower in general than Perl, for the reason of process creation. But I don't think this case counts. Ciao, Gryn :)	[reply] [d/l]
RE: I agree, honest. by merlyn (Sage) on Aug 20, 2000 at 22:47 UTC
And it freaks badly if you have any filenames with whitespace in them, especially a newline. This same thing in Perl works just fine in one process: `#!/usr/bin/perl opendir DOT, "."; unlink grep { -f and not /\.html$/ } readdir DOT;` [download] -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Crasssh by gryng (Hermit) on Aug 20, 2000 at 22:50 UTC
Very true :) , kudos.	[reply]
RE: I agree, honest. by fundflow (Chaplain) on Aug 21, 2000 at 00:58 UTC
In case you care: find . -type f -not -name '*.html' -maxdepth 1 -exec rm '{}'; Does the same job with only one process. For me, performance doesn't matter that much. I usually deal with 100..2000 files each time and running time isn't much different. One should measure the total time from the split-second in which your brain decided what you want to do and until you see the next command prompt :) This is why it's useful to have simple basic blocks (with short names :) that do the job.	[reply]
RE (tilly) 2: I agree, honest. by tilly (Archbishop) on Aug 21, 2000 at 01:25 UTC
Sorry, not true. From a manpage for find: -exec command ; Execute command; true if 0 status is returned. All following arguments to find are taken to be argu ments to the command until an argument consisting of `;' is encountered. The string `{}' is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find. Both of these constructions might need to be escaped (with a `\') or quoted to protect them from expansion by the shell. The command is exe cuted in the starting directory. [download] Every time it reaches the exec it launches a new process. Your version actually launches a separate instance of /bin/rm per file processed! (Good thing *nix optimizes process creation!) But for one-off jobs, you are right. How long it takes you to remember how to do it probably matters more than any details about how much work it is for the computer. (For mass deletes I usually write a short Perl script rather than look at find just because I know Perl very well. YMMV.) But these performance considerations matter a lot for jobs that will be run repeatedly...	[reply] [d/l]
RE: RE (tilly) 2: I agree, honest. by fundflow (Chaplain) on Aug 21, 2000 at 17:28 UTC


Don't ask to ask, just ask
	PerlMonks