Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

WARNING t0mas wrote BAD CODE

by merlyn (Sage)
on Jun 14, 2000 at 19:20 UTC ( #18118=note: print w/ replies, xml ) Need Help??

in reply to RE: RE: Odd file rename
in thread Odd file rename

BAD CODE ABOVE t0mas, please don't write a recursing routine for directory walking. You broke it in the normal way people broke it. You don't check for symlinks. Now it can run forever if the symlinks point "above".

Be smart. Don't be a cargo-cultist. Use File::Find for recursion.

-- Randal L. Schwartz, Perl hacker

Comment on WARNING t0mas wrote BAD CODE
RE: RE: RE: RE: Odd file rename
by t0mas (Priest) on Jun 14, 2000 at 19:24 UTC
    Please correct me merlyn but sholdn't the -d take care of that?

    /brother t0mas
      No. -d on a symlink that points to a directory returns true for both the symlink and the directory. Hence, you could end up anywhere, including your own parent.

      -- Randal L. Schwartz, Perl hacker

Warning: merlyn was a BAD MONK
by Ozymandias (Hermit) on Jun 14, 2000 at 19:50 UTC
    Look, merlyn, I like your work. You're a good author, and an incredibly good Perl coder. I only wish my code were as clean and neat as yours.

    That said, tone it down. Did t0mas give a sub-optimal piece of code? Sure. From your warning, sounds like it's not a good thing at all. And yes, using File::Find would probably be a better solution. But damnit, TMTOWTDI. And even if there WEREN'T, that's not an excuse to blast him like that.

    - Ozymandias

    Update: It's been a long morning in the chatterbox. For the record, no, I did not intend the original version of this post (ie everything above the word "Update") to be a personal attack on merlyn. We all know who he is, we all know and respect him. Most of us, if not all of us, even like him. I felt (and others agreed) that in this case, merlyn went over the top. After much discussion, he has moderated his tone a bit, and a message further down this thread clarifies his position a bit. I think that was a good thing to do; I think the new version is much, much better, and probably would not have generated the response the original did.

    Now this thread is over. If I have to, I'll find someone to post a Hitler reference. Don't think I won't do it...

      So you'd rather have people copy t0mas's work and then get code that goes off into never-never land or triggers CERT warnings?

      This is what peer review is about. t0mas posted code. I reviewed it. Sure, there's more than one way to do it, but this wasn't one of them.

      -- Randal L. Schwartz, Perl hacker

        Actually, no. I'd rather you be the kind of saint we all know you really are. As I said, you know more than I do; probably more than anyone here does. That doesn't mean +7 font tags are the right way to go. I'd rather you say that recursion the way t0mas did it causes problems with symlinks, and as a result you'd suggest using some other method. t0mas' method does work, though, right? If I were going to use it in some situation (entirely hypothetical) where symlinks weren't an issue, I might prefer to do it that way for some reason - and in that case, I'd rather have your suggestion on how to do it better. That's all I'm saying.

        - Ozymandias

      Nazis, nazis, you're all nazis! Are we done now?

      Please check the link above before voting me --. This was done tongue-in-cheek and partially in response to a lot of near-flaming Chatterbox discussion. Actually, I think this entire thing might have blown over much earlier if it weren't for a certain May 17 Initiate who kept fanning the flames.

      And for those who are sick of Godwin's law, please check out Seebach's law.

        Looks like you lose ;-)

RE: WARNING t0mas wrote BAD CODE
by t0mas (Priest) on Jun 15, 2000 at 02:54 UTC
    Hi merlyn. I did some benchmarking to see how opendir/readdir/closedir performed compared to File::Find.
    use Benchmark; use File::Find; my $dirSep = $^O eq "MSWin32" ? "\\" : "/"; my $fileCounter=0; $t0 = new Benchmark; &test1('Program'); $t1 = new Benchmark; print $fileCounter . "\n"; $fileCounter=0; $t2 = new Benchmark; &test2('Program'); $t3 = new Benchmark; print $fileCounter . "\n"; print "test1: ",timestr(timediff($t1, $t0)),"\n"; print "test2: ",timestr(timediff($t3, $t2)),"\n"; sub test1 { my $Dir = shift; opendir(DIR, $Dir) || die "Can't opendir $Dir: $!"; my @Files = grep { /\.txt$/ && -f "$Dir$dirSep$_" } readdir( +DIR); rewinddir(DIR); my @Dirs = grep { /^[^.].*/ && -d "$Dir$dirSep$_" && ! -l "$ +Dir$dirSep$_"} readdir(DIR); closedir DIR; foreach (@Files) { $fileCounter+=1; } foreach $SubDir (@Dirs) { &test1(join($dirSep,$Dir,$SubDir)) +; } }; sub test2 { my ($Dir) = shift; find(\&found, $Dir); } sub found { -f && /\.txt$/ && ($fileCounter+=1); }
    The test-machine was a dual boot P233MHz/256RAM. The Program directory is on a FAT32 partition.

    On Win2000 the results where:

    test1: 52 wallclock secs (17.89 usr + 34.68 sys = 52.57 CPU)
    test2: 50 wallclock secs (18.41 usr + 30.94 sys = 49.35 CPU)

    On Redhat Linux 6.0 the results where:
    test1: 14 wallclock secs ( 2.25 usr + 12.57 sys = 14.82 CPU)
    test2: 14 wallclock secs ( 2.86 usr + 10.11 sys = 12.97 CPU)

    Pretty even...

    When I added a /\.txt/ && to &found and to the @Files = grep {} statement in &test1 the results changed a bit. The more complex regexp (regexp wise), the worse Find::File performed on Win32.
    On Win2000 the new results where:

    test1: 30 wallclock secs (10.17 usr + 18.79 sys = 28.95 CPU)
    test2: 32 wallclock secs (12.19 usr + 19.76 sys = 31.95 CPU)

    On Linux it depended on when the regexp was evaluated. If I put it before -f, it performed better than if I put it after.

    I believe that the choise of method depends on the problem faced. My "every-day" environment is Win32 and in my opinion the File::Find has a simple and elegant syntax, but it performs worse than the opendir/readdir/closedir on my main environment.
    So until File::Find performs better on Win32, you can call me stupid and cargo-cultist, but I guess I'll stick to opendir/readdir/closedir.
    Please note that I don't want to start a war about this, I just wanted to clarify my opinions. I do agree that the code I posted had some issues.

    /brother t0mas
      my @Files = grep { /\.txt$/ && -f "$Dir$dirSep$_" } ... ... -f && /\.txt$/ && ($fileCounter+=1);
      You've made a big benchmarking error. You are comparing two very different, though seemingly equivalent, operations. In your first test, you first do a relatively inexpensive operation first which in most cases short-circuits the need to do the relatively expensive operation second (assuming there are relatively few '.txt' files).

      In the second test, you always perform the expensive operation (a file stat, or whatever the Windows equivalent is), which makes this an unfair comparison. Make the comparison fair and I think you'll be surprised.

      BTW, it would take more than the 4-6% improvement that you cite for me to reinvent this wheel anyway :-)

        I'm not surprised.

        If you read my post carefully you'll note that I wrote On Linux it depended on when the regexp was evaluated. If I put it before -f, it performed better than if I put it after..

        I've experimented a lot with this issue before making the post, and I've experimented a lot before making my original post which caused so much debate.

        The file stat has no _significant_ effect on the benchmark. No major impact as you say.

        You can try it yourself!

        The impact it have is on the first run only, since the second run is read from some file cache and become very in-expensive.

        I re-ran the benchmark today (my current box is a Pentium 1000, Windows 2000 with perl v5.6.1 built for MSWin32-x86-multi-thread) and included a find sub with no -f at all (the test3), hitting 1250 files:

        test1: 42 wallclock secs ( 9.21 usr + 31.04 sys = 40.26 CPU)
        test2: 52 wallclock secs (13.48 usr + 34.29 sys = 47.77 CPU)
        test3: 51 wallclock secs (13.14 usr + 34.33 sys = 47.47 CPU)


        I think the decicion to reinvent or not (in this case), depends on wether 4-6% is important or not. If 4-6% speed gain makes your program meet the specifications and fail otherwise, what then?.

        /brother t0mas

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://18118]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2014-07-30 06:47 GMT
Find Nodes?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:

    Results (229 votes), past polls