Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Myth busted: Shell isn't always faster than Perl

by zentara (Archbishop)
on Dec 30, 2005 at 17:29 UTC ( #520031=perlmeditation: print w/ replies, xml ) Need Help??

Hi, this is just one of those happy things that make you glad to use Perl. Someone asked a question in a newsgroup on how to recursively delete files from a directory tree, and leave the directories. So I showed him this Perl script.
#!/usr/bin/perl use warnings; use File::Find; #takes list of dirs on commandline #must give one or get an error finddepth sub { return if $_ eq "." or $_ eq ".."; return if -d; unlink($_); # print "$File::Find::name\n"; #if you want printout }, @ARGV; __END__

So a couple of the shell guru's, who like to "bash" Perl, said this is faster

find . -type f -exec rm {} \;
So I decided to test it on a deeply nested directory tree, of 80 Megs in size, and timed them.
$ time find . -type f -exec rm {} \; real 0m2.987s user 0m0.785s sys 0m2.184s $ time ./zdelfiles Gtk3 real 0m0.384s user 0m0.076s sys 0m0.308s

The Perl script was in the range of 10 times faster. :-) Comments, improvements, and edifications welcome.


I'm not really a human, but I play one on earth. flash japh

Comment on Myth busted: Shell isn't always faster than Perl
Select or Download Code
Re: Myth busted: Shell isn't always faster than Perl
by jdporter (Canon) on Dec 30, 2005 at 17:50 UTC
    I think you'd better do
    sub { -f _ or return; unlink $_;
    You really only want to unlink "regular" files; and this makes the comparison apples-and-apples with the shell version.

    Also, explicitly testing '.' and '..' is superfluous, because they'd be caught by -d.

    We're building the house of the future together.

        Yes, my reply originally looked like that; but as the OP said, you may want to do additional things, such as reporting.

        We're building the house of the future together.
Re: Myth busted: Shell isn't always faster than Perl
by Roy Johnson (Monsignor) on Dec 30, 2005 at 18:54 UTC
    The problem with the shell version is that it's spawning a new process for every rm. The usual practice is to use xargs in conjunction with find.
    time find . -type f -print | xargs rm
    I don't know how much that will affect your timings, though.

    Caution: Contents may have been coded under pressure.
      Yeah, that brings the shell closer in speed, BUT it starts complaining AND skipping filenames with spaces in them, I believe that is why the original contruct was the way it was.

      I'm not really a human, but I play one on earth. flash japh
        it starts complaining AND skipping filenames with spaces in them

        ...And that is precisely why find has the -print0 switch and xargs has the -0 (or --null) switch.

        find . -type f -print0 | xargs -0 rm
        We're building the house of the future together.
      I'd say it's a wash. Trying it several times, the best times I got were:
      time perl -MFile::Find -e'finddepth sub { unlink if -f }, @ARGV' /tmp
      real    0m3.111s
      user    0m0.821s
      sys     0m2.233s
      
      time find /tmp -type f | xargs rm
      real    0m3.312s
      user    0m0.760s
      sys     0m2.511s
      
      And the varied widely - anywhere up to 5 seconds.

      Remember: There's always one more bug.
        I tested my original script with the
        return if -d; unlink $_
        against the golfed
        unlink $_ if -f;
        and the golfed -f test seems to be a bit slower. Maybe because unlink somehow gets called for each directory then stopped? Whereas the 'return if -d ' returns immediately. BUT the improved shell with null
        find . -type f -print0 | xargs -0 rm
        seems to win :-(
        time -d-test Gtk3 real 0m0.412s user 0m0.074s sys 0m0.337s time -f-test Gtk3 real 0m0.478s user 0m0.076s sys 0m0.388s time find . -type f -print0 | xargs -0 rm real 0m0.334s user 0m0.012s sys 0m0.321s

        I'm not really a human, but I play one on earth. flash japh
Re: Myth busted: Shell isn't always faster than Perl
by itub (Priest) on Dec 30, 2005 at 19:29 UTC
    I had never heard that myth; actually, I tend to hear the opposite. The truth is, it depends. ;-)
Re: Myth busted: Shell isn't always faster than Perl
by Anonymous Monk on Dec 30, 2005 at 19:59 UTC
    That is outstanding. We, at my company, had the same type of discussion. We had a failed bash script that we needed to fix but no one really knows bash, we are perl guys.
Re: Myth busted: Shell isn't always faster than Perl
by Anonymous Monk on Dec 30, 2005 at 23:51 UTC
    Put your own inability to develop quality shell script as a defect of the shell, do it, clever plan.
      Well that speaks to the point I'm making. The people who suggested the original slow shell script, are well respected and talented shell programmers. And my "run-of-the-mill" Perl script, beat it. So when someone says, "why use Perl, I can do it faster with a shell script", you better think twice; because maybe the Perl is faster.

      Also the optimized shell script only beat the Perl version by a nose, Considering how much more flexible the Perl script is, in processing the files as they are found, run-of-the-mill Perl is likely to be faster, than a run-of-the-mill shell, doing some equivalent task. Shell, with it's constant spawing of awk and sed, etc.; is probably harder to do at optimized speed, compared to Perl.


      I'm not really a human, but I play one on earth. flash japh
Re: Myth busted: Shell isn't always faster than Perl
by Perl Mouse (Chaplain) on Dec 31, 2005 at 00:40 UTC
    I've never heard of the myth "Shell is always faster". Not that your myth busts anything - using the 'exec' option to delete one file at the time is a far from optimal solution. As pointed out, the '-print0' in combination with 'xargs' is much more efficient, as it saves spawning a gazillion processes.

    I'm a bit surprised however that no-one so far as piped in the "programmer time is more costly than running time" mantra. Surely, the 2 seconds running time difference are dwarved by all the extra typing you need in your Perl solution. Or are Perl programmers cheap, and shell programmers expensive?

    I would always go for the shell solution. I'll have deleted all the files even before you've finished typing your Perl program.

    Perl --((8:>*
      I never type out a script more than once, it goes into a /bin directory in my path. "Damn it Jim, I'm a Perl hacker, NOT a typist" :-)

      I'm not really a human, but I play one on earth. flash japh
        But if you haven't been on the system yet, you haven't had a chance to install your "delete files and leave the directory structure" program yet.

        One way of doing system administration is to write a little program for every minor task you want. A small change, a different program. And then, everyone has to carry disks with their personal libraries around. Granted, it's workable.

        I myself prefer the Unix/POSIX solution. Lots of small tools, that can be stacked like legos. Tools that are everywhere, like find and xargs. When I sit down at a Unix system, I can type

        find . -type f -print0 | xargs rm
        to delete files, and leave the directory structure as is. I don't have to remember whether I installed a program doing this for me on the box, and if I did, how it's called. And I don't need to write a new program if I want to delete all files older than a week - just add an extra option to find. (Sure, you could enhance your program that it takes all kinds of options, but if you have to type as many options to your program as to find, you might as well have used find in the first place).

        I'm not a monoculturist programmer. For anything complex, I write a Perl or a C program (preferably Perl, but that isn't always available - if all you have is a few Mb of RAM and a dozen or so Mb on disk, there's no Perl, but busybox stacks a lot of goodies in just a few kb). But I don't bother writing programs for tasks that I don't do that often and that only require a few simple commands. That's not efficient.

        Perl --((8:>*

      I would always go for the shell solution. I'll have deleted all the files even before you've finished typing your Perl program.

      Well, since you are being snarky I'll respond in kind: I doubt it, i reckon youll still be fighting with the shell syntax, and doublechecking that the switches and utilities you got so used to in bash are actually present in the shell you need to run it on. And even then you still wont be 100% confident that it will all work as expected.

      Which to me is the reason that perl scripts beat shell scripts hands down pretty well every time. I can use the same perl script on every shell and OS I can find pretty much. Your shell script will only work on a small subset of them, and will require massive changes for some of them.

      Shell scripts are only worth thinking about if you are a monoculture programmer. Since I'm not I view them mostly with contempt. Who needs shell scripts when you have perl scripts instead?

      ---
      $world=~s/war/peace/g

        I reckon youll still be fighting with the shell syntax

        I can type find | xargs pipes in my sleep.

        doublechecking that the switches and utilities you got so used to in bash are actually present in the shell you need to run it on

        Present in the shell? They’re external binaries; which shell you’re using is irrelevant. Maybe “present on the system,” except that if find, xargs and rm are not present, that is one very broken system. And the -print0/-0 switches are available on these commands on all Unixoid systems where I cared to look.

        And all that is far more likely to be around than perl, in any case.

        If your portability argument concerns moving between Windows and Unix, well, I can see how someone working on Windows would prefer to always use Perl… :-)

        Makeshifts last the longest.

        Well, since you are being snarky I'll respond in kind: I doubt it, i reckon youll still be fighting with the shell syntax, and doublechecking that the switches and utilities you got so used to in bash are actually present in the shell you need to run it on. And even then you still wont be 100% confident that it will all work as expected.
        Bollocks. find | xargs has worked on every Unix system I've used for the last 30 years. Out of the box. In any shell, as the only 'shell' thing here is the pipe, which is universal. It has worked long before Larry released perl1.0, and it will continue to work long after perl5 will be a distant memory.
        Which to me is the reason that perl scripts beat shell scripts hands down pretty well every time. I can use the same perl script on every shell and OS I can find pretty much. Your shell script will only work on a small subset of them, and will require massive changes for some of them.
        The shell solution will work on at least anything that's POSIX compliant. Will your Perl program work in perl6? How would you know - it may work on todays version of perl6, but maybe not on next weeks. As for Perl being present on the OS by default, for many OSses, it's only quite recent that their OS came with some version of perl5 installed.
        Shell scripts are only worth thinking about if you are a monoculture programmer. Since I'm not I view them mostly with contempt. Who needs shell scripts when you have perl scripts instead?
        So, you do everything with Perl scripts, so you're not a monoculture programmer? Interesting. What's your definition of monoculture then?

        But you're right. Once you have a truck, you have no need for a bicycle. It's much easier to start up the truck and find a parking spot, just to get a newspaper from the shop around the corner. It's cheaper as well. Bicyclists are monoculture traffic participants - none of them know how to drive a car.

        Perl --((8:>*
Re: Myth busted: Shell isn't always faster than Perl
by Tanktalus (Canon) on Dec 31, 2005 at 17:37 UTC

    zentara, try this one. I wrote this many years ago to clean up 100's of MB of source code (meaning 100's of 1000's of files) and it seems pretty fast. Way faster than rm -rf, for example. However, my goal wasn't to remove just the files, but the whole tree. I'll comment out the part that removes directories just to make it do what yours does. Granted ... this is a bit more complex. But it can't easily be duplicated in shell.

    use strict; use warnings; $|=1; foreach my $d (@ARGV) { remove_dir($d); rmdir $d; } print "\nDone.\n"; sub remove_dir { my $d = shift; if ( -f $d or -l $d ) { unlink $d; return; } # must be a directory? my (@sfiles, @sdirs); local *DIR; opendir(DIR, $d) || do { print "Can't open $d: $!\n"; return }; foreach (readdir(DIR)) { next if ($_ eq '.'); next if ($_ eq '..'); my $sd = "$d/$_"; if ( -l $sd ) { push(@sfiles, $sd);} elsif ( -d $sd ) { push(@sdirs, $sd); } else { push(@sfiles, $sd); } } closedir(DIR); print "."; # process subdirectories via fork my $count; foreach my $sd (@sdirs) { my $pid; if ($pid = fork()) { # parent ++$count; } elsif (defined $pid) { # child remove_dir($sd); exit; } else { # failure - try again in a bit sleep 5; redo; } while ($count > 2) { wait(); $count--; } } while (wait() != -1) {} #foreach (@sdirs) { # rmdir $_ || do { # warn "$0: Unable to remove directory $_: $!\n"; # }; #} my @cannot = grep {!unlink($_)} @sfiles; if (@cannot) { warn "$0: cannot unlink @cannot\n"; } }
    I'll also add that the difference in speed between .4s and 3s is quite negligible when compared to the amount of time it takes to remember and write them. This example above is ludicrously expensive to write, but it is something I do enough that I call it "RD" (yes, upper-case - it's too dangerous to get a short lower-case name) and put it in /usr/local/bin on all machines, all platforms, that I have access to (primarily as a symlink to a shared NFS partition). We really do use it that much ;-)

Re: Myth busted: Shell isn't always faster than Perl
by runrig (Abbot) on Jan 02, 2006 at 19:31 UTC
    It depends on what you're doing also. Once when I rewrote a third-party utility in perl, rewriting this bit caused it to go slower in perl:
    grep "^function" *.4gl | sed "s/\(.*\):function \(.*\)(.*/\2 \1 \/^function \2(/"
    But the above was wrong, so I rewrote a "correct" perl version :
    /^\s*function\s+(\w+)\s*\(/i # and then use hashes to save data so there's no s///
    I rewrote the new perl version in shell (grep/sed) for kicks, and it was slower than the perl version (and much uglier).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://520031]
Approved by holli
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (11)
As of 2014-07-30 10:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (230 votes), past polls