Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Reinventing wheels based on bad benchmarks

by t0mas (Priest)
on Jan 10, 2003 at 08:23 UTC ( #225741=note: print w/replies, xml ) Need Help??

in reply to Reinventing wheels based on bad benchmarks
in thread Odd file rename

I'm not surprised.

If you read my post carefully you'll note that I wrote On Linux it depended on when the regexp was evaluated. If I put it before -f, it performed better than if I put it after..

I've experimented a lot with this issue before making the post, and I've experimented a lot before making my original post which caused so much debate.

The file stat has no _significant_ effect on the benchmark. No major impact as you say.

You can try it yourself!

The impact it have is on the first run only, since the second run is read from some file cache and become very in-expensive.

I re-ran the benchmark today (my current box is a Pentium 1000, Windows 2000 with perl v5.6.1 built for MSWin32-x86-multi-thread) and included a find sub with no -f at all (the test3), hitting 1250 files:

test1: 42 wallclock secs ( 9.21 usr + 31.04 sys = 40.26 CPU)
test2: 52 wallclock secs (13.48 usr + 34.29 sys = 47.77 CPU)
test3: 51 wallclock secs (13.14 usr + 34.33 sys = 47.47 CPU)


I think the decicion to reinvent or not (in this case), depends on wether 4-6% is important or not. If 4-6% speed gain makes your program meet the specifications and fail otherwise, what then?.

/brother t0mas
  • Comment on Re: Reinventing wheels based on bad benchmarks

Replies are listed 'Best First'.
Re^2: Reinventing wheels based on bad benchmarks
by Aristotle (Chancellor) on Jan 11, 2003 at 18:11 UTC
    If 4-6% speed gain makes your program meet the specifications and fail otherwise, what then?

    It's a tired argument/rebuttal, but if reinventing this wheel puts you inside the specifications, then chances are the time you spent reinventing would have been far better invested in some other part of the code that will likely gain you more than a mere 5% average improvement.

    Remember that crawling directories is a heavily I/O bound activity where optimizations in your code are unlikely to be able to make a great deal of difference.

    However, as a suggestion (I haven't benched it), try this:

    sub test4 { find({ preprocess => sub { $fileCounter += grep /\.txt$/ && -f, @_ }, wanted => sub {}, }, shift); }
    (Actually, I'm thinking I'll go submit a patch so that find doesn't require a wanted in case a preprocess and/or postprocess is given.)

    Makeshifts last the longest.

Re: Re: Reinventing wheels based on bad benchmarks
by runrig (Abbot) on Jan 10, 2003 at 18:16 UTC
    I ran my tests on a fairly slow Win95 PC, and came out with File::Find being slightly faster. All in all, for a simple find like this, I might use File::Find::Rule anyway (and it would have even more overhead):
    my @files = File::Find::Rule->file->name(*.txt)->in($dir);

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://225741]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2017-12-13 05:55 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (345 votes). Check out past polls.