Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

use File::Slurp for (!"speed");

by etcshadow (Priest)
on Nov 08, 2004 at 09:02 UTC ( [id://406004]=perlmeditation: print w/replies, xml ) Need Help??

I apologize up front for sounding nit-picky or petty. This is essentially a public rebuttal to a private comment. It went something along the lines of "We should start using File::Slurp because it is 'way more efficient' {than just undefing $/ localy}."

Well, first of all... this was a bit frustrating because it was said person's first day on the job, and he had yet to read more than a smattering of the roughly one million lines of code under the aegis of "the application". He had essentially no context for where performance bottlenecks lie or what sort of priorities we had. For example, as priorities: stability far exceeds speed... local $/; is pure perl, but File::Slurp is untested (by us), highly platform specific (for whatever damn reason), foreign code. But anyway, I'm willing to cut the guy some slack... he's a smart guy, and he's new. He was just trying to establish his place with the alpha geeks.

But what about his argument? Well, I couldn't really give a damn about the microsecond that might be gained by doing raw I/O when reading a file smaller than a disk block (because, hell: we're not using slurping for any kind of sizable file... who would be doing that and still be concerned about performance?). However, I've recently had some dealings with the author of said module, and, frankly, wanted to see if his work stood up to the standard that he seems to set for everyone else (even when it is not an appropriate standard... but I digress). As one of my coleagues put it: "Man... if you're gonna act like that, you'd better never never make a mistake."

So, put up or shut up time:

So, I kept the benchmark super simple, and tried to control for the most obvious sources of error... I use a freshly created, and different (but identical in contents) file for local $/ and for File::Slurp. I run them each once, so that you can see the difference caused by the additional compile time of File::Slurp, and then I run them each 2000 times so that you can see the actual performance of the file reading. Then I repeat the 2000 test again, just to smooth out any problems that could potentially be caused by caching or whatnot:
[me@host test]$ echo -e foo\\nbar\\nbaz > a [me@host test]$ time perl -e '$x = do { local (@ARGV, $/) = "a"; <> }; + print $x' foo bar baz real 0m0.031s user 0m0.000s sys 0m0.000s [me@host test]$ time perl -e '$x = do { local (@ARGV, $/) = "a"; <> } +for 1..2000; print $x' foo bar baz real 0m0.074s user 0m0.020s sys 0m0.060s [me@host test]$ time perl -e '$x = do { local (@ARGV, $/) = "a"; <> } +for 1..2000; print $x' foo bar baz real 0m0.074s user 0m0.040s sys 0m0.030s [me@host test]$ echo -e foo\\nbar\\nbaz > b [me@host test]$ time perl -e 'use File::Slurp; $x = read_file "b"; pri +nt $x' foo bar baz real 0m0.066s user 0m0.020s sys 0m0.010s [me@host test]$ time perl -e 'use File::Slurp; $x = read_file "b" for +1..2000; print $x' foo bar baz real 0m0.136s user 0m0.090s sys 0m0.040s [me@host test]$ time perl -e 'use File::Slurp; $x = read_file "b" for +1..2000; print $x' foo bar baz real 0m0.138s user 0m0.090s sys 0m0.050s [me@host test]$

So the short answer is that File::Slurp is about twice as slow (on a small file) as the simple perl built-in method for reading a file all at once. I'll reserve my rant about why something which is built into the language really necessitates an overly complicated module... I just wanted to make a point about the often heard "File::Slurp is faster!" argument. (And, perhaps, to deflect a well-heaved stone back in the general direction of a glass house.)

------------ :Wq Not an editor command: Wq

Replies are listed 'Best First'.
Re: use File::Slurp for (!"speed");
by rob_au (Abbot) on Nov 08, 2004 at 10:23 UTC
    You may be interesting in taking a look at the article "Perl Slurp-Eaze" - http://www.perl.com/pub/a/2003/11/21/slurp.html - which discusses and benchmarks a number of different methods for slurping files.

     

    perl -le "print unpack'N', pack'B32', '00000000000000000000001011101111'"

Re: use File::Slurp for (!"speed");
by Limbic~Region (Chancellor) on Nov 08, 2004 at 13:44 UTC
    etcshadow,
    It appears you quoted the new guy as having said 'way more efficient', but by the rest of the post it appears you intepreted that to mean faster. Personally, I have never heard anyone say File::Slurp is faster. There are many ways to be efficient (memory, CPU, IO, run-time, programming-time, etc). While this sometimes means it results in faster run-times, it is not a requisite for it to be efficient.

    Efficiency is about getting the biggest return for the smallest investment. It is also necessary to define return and investment since those are things that matter to you. Trading memory for speed wouldn't normally be said to be efficient except that usually speed matters and memory doesn't unless you run out of it.

    Personally, I would have asked the new guy in what way was it more efficient. It is possible that you did ask, the response was speed, and this rant is warranted - but you failed to include that information. It is also possible that because of the file size, whatever efficiency File::Slurp offered wasn't worth the trade (programmer-time efficiency). Personally, I would like to know why it is considered more efficient as I haven't had time to look at the code myself.

    Cheers - L~R

Re: use File::Slurp for (!"speed");
by hardburn (Abbot) on Nov 08, 2004 at 14:31 UTC

    Usually when I slurp a file, it's not for speed. It's because the file format is easier to deal with if I grab it all at once. For instance, right now I'm building a module that has a fairly complex mini-language for its configuration file. I'm certain there won't be any speed benefit to slurping this file (a typical application will read the config exactly once at startup), but it's easier to feed Parse::RecDescent data that way.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Re: use File::Slurp for (!"speed");
by revdiablo (Prior) on Nov 08, 2004 at 18:20 UTC
    reading a file smaller than a disk block (because, hell: we're not using slurping for any kind of sizable file... who would be doing that and still be concerned about performance?)

    I think there is some area between a file "smaller than a disk block" and "any kind of sizable file." You only test one end of the spectrum, and nowhere in the middle. Let's see how File::Slurp holds up with a bit more testing.

    First, the benchmark code:

    use File::Slurp; use Benchmark qw(cmpthese); cmpthese(-2, { fs => sub { $x = read_file "foo" }, is => sub { $x = is("foo"); } }); sub is { local (@ARGV, $/) = $_[0]; <> };

    Note, I put the idiomatic slurp into a subroutine. Much of File::Slurp's compared performance hit was due to Perl's slow subroutine calling. I thought it would be a more apples-to-apples comparison to add that hit for the idiomatic slurp. Feel free to post results with the idiomatic slurp inlined, if you want, but all they will do is change the point where File::Slurp overtakes the IS.

    Now, some test runs:

    $ perl -e 'print "x"x500' > foo $ perl benchmark Rate fs is fs 30210/s -- -33% is 44886/s 49% -- $ perl -e 'print "x"x5_000' > foo $ perl benchmark Rate fs is fs 27499/s -- -26% is 37057/s 35% -- $ perl -e 'print "x"x50_000' > foo $ perl benchmark Rate is fs is 11275/s -- -14% fs 13094/s 16% -- $ perl -e 'print "x"x500_000' > foo $ perl benchmark Rate is fs is 277/s -- -15% fs 325/s 17% -- $ perl -e 'print "x"x5_000_000' > foo $ perl benchmark Rate is fs is 29.5/s -- -17% fs 35.6/s 21% --

    As we can see, the idiomatic slurp is faster for the 500 and 5,000 byte files, but once we get into the 50,000 range, File::Slurp takes the lead. You may consider 50k to be too big to slurp, but I certainly do not. In fact, I can even imagine circumstances where slurping the 5mb file would be reasonable.

    Don't get me wrong. I'm not trying to disagree with your main point. In your particular case, switching to File::Slurp was probably not the right idea. But I certainly can see the case for slurping files that are 50k or more, and in this case, File::Slurp is faster.

      OK, let's not arbitrarily turn something into a function call that doesn't have to be, and, more importantly, let's recognize the overhead of loading a module:
      use Benchmark qw(cmpthese); cmpthese(-2, { fs => sub { delete $INC{'File/Slurp.pm'}; require File::Slurp; + File::Slurp->import(); $x = read_file("foo") }, is => sub { $x = do { local (@ARGV, $/) = "foo"; <> }; } }); --- $ perl5.6.0 test.pl Benchmark: running fs, is, each for at least 2 CPU seconds... fs: 3 wallclock secs ( 2.08 usr + 0.11 sys = 2.19 CPU) @ 26 +6.67/s (n=584) is: 2 wallclock secs ( 1.16 usr + 0.86 sys = 2.02 CPU) @ 47 +311.39/s (n=95569) Rate fs is fs 267/s -- -99% is 47311/s 17642%
      And, yes, I am being a little tongue in cheek with that... but the point is: benchmarks are what you make of them.
      ------------ :Wq Not an editor command: Wq
        let's not arbitrarily turn something into a function call that doesn't have to be

        I didn't turn it into a function call arbitrarily. In fact, I explained my reasoning for doing so. I was attempting to compare the speed of File::Slurp to the idiomatic slurp. I was not attempting to measure the speed of Perl's subroutine calls, much less loading of modules. It's already well-established that these things are slow.

        This is how I see things so far: you stated that File::Slurp is slower than the idiomatic slurp. I demonstrated that it is, but only under very constrained circumstances. Then, you constrained the circumstances even further. I don't get the point you're going for here...

Re: use File::Slurp for (!"speed");
by ysth (Canon) on Nov 08, 2004 at 19:22 UTC
    To summarize revdiablo's results: if it's going to take more than 1/30000th of a second, File::Slurp is faster; otherwise the idiomatic slurp may be faster. Sounds like the OP here caught a serious case of premature optimization (ironically, while somewhat arguing against it).

    There are other ways to measure "efficiency". For instance, I know the difference between $/ and $\, but I'm still prone to mistype one for the other perhaps as much as 1% of the time. The time it takes to identify and correct this is a huge inefficency; just using File::Slurp eliminates this problem, at least in the case of file slurping.

Re: use File::Slurp for (!"speed");
by diotalevi (Canon) on Nov 08, 2004 at 19:50 UTC
    I use File::Slurp because then I can write read_file( ... ) instead of do { local @ARGV = ...; local $/; scalar <> ]. The former is easier to read. I also don't typically write my own read_file because when I do, it often comes out to about a dozen lines and I'd prefer to let modules handle plumbing tasks that would otherwise just be clutter. I also share ysth's inability to remember which of $/ and $\ controls the EOL for readline and the EOL for print.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://406004]
Approved by Corion
Front-paged by markjugg
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-24 22:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found