use File::Slurp for (!"speed");

I apologize up front for sounding nit-picky or petty. This is essentially a public rebuttal to a private comment. It went something along the lines of "We should start using File::Slurp because it is 'way more efficient' {than just undefing $/ localy}."

Well, first of all... this was a bit frustrating because it was said person's first day on the job, and he had yet to read more than a smattering of the roughly one million lines of code under the aegis of "the application". He had essentially no context for where performance bottlenecks lie or what sort of priorities we had. For example, as priorities: stability far exceeds speed... local $/; is pure perl, but File::Slurp is untested (by us), highly platform specific (for whatever damn reason), foreign code. But anyway, I'm willing to cut the guy some slack... he's a smart guy, and he's new. He was just trying to establish his place with the alpha geeks.

But what about his argument? Well, I couldn't really give a damn about the microsecond that might be gained by doing raw I/O when reading a file smaller than a disk block (because, hell: we're not using slurping for any kind of sizable file... who would be doing that and still be concerned about performance?). However, I've recently had some dealings with the author of said module, and, frankly, wanted to see if his work stood up to the standard that he seems to set for everyone else (even when it is not an appropriate standard... but I digress). As one of my coleagues put it: "Man... if you're gonna act like that, you'd better never never make a mistake."

So, put up or shut up time:

So, I kept the benchmark super simple, and tried to control for the most obvious sources of error... I use a freshly created, and different (but identical in contents) file for local $/ and for File::Slurp. I run them each once, so that you can see the difference caused by the additional compile time of File::Slurp, and then I run them each 2000 times so that you can see the actual performance of the file reading. Then I repeat the 2000 test again, just to smooth out any problems that could potentially be caused by caching or whatnot:

[me@host test]$ echo -e foo\\nbar\\nbaz > a
[me@host test]$ time perl -e '$x = do { local (@ARGV, $/) = "a"; <> };
+ print $x'
foo
bar
baz

real    0m0.031s
user    0m0.000s
sys     0m0.000s
[me@host test]$ time perl -e '$x = do { local (@ARGV, $/) = "a"; <> } 
+for 1..2000; print $x'
foo
bar
baz

real    0m0.074s
user    0m0.020s
sys     0m0.060s
[me@host test]$ time perl -e '$x = do { local (@ARGV, $/) = "a"; <> } 
+for 1..2000; print $x'
foo
bar
baz

real    0m0.074s
user    0m0.040s
sys     0m0.030s
[me@host test]$ echo -e foo\\nbar\\nbaz > b
[me@host test]$ time perl -e 'use File::Slurp; $x = read_file "b"; pri
+nt $x'
foo
bar
baz

real    0m0.066s
user    0m0.020s
sys     0m0.010s
[me@host test]$ time perl -e 'use File::Slurp; $x = read_file "b" for 
+1..2000; print $x'
foo
bar
baz

real    0m0.136s
user    0m0.090s
sys     0m0.040s
[me@host test]$ time perl -e 'use File::Slurp; $x = read_file "b" for 
+1..2000; print $x'
foo
bar
baz

real    0m0.138s
user    0m0.090s
sys     0m0.050s
[me@host test]$
[download]

So the short answer is that File::Slurp is about twice as slow (on a small file) as the simple perl built-in method for reading a file all at once. I'll reserve my rant about why something which is built into the language really necessitates an overly complicated module... I just wanted to make a point about the often heard "File::Slurp is faster!" argument. (And, perhaps, to deflect a well-heaved stone back in the general direction of a glass house.)

------------
:Wq
Not an editor command: Wq
[download]

Comment on use File::Slurp for (!"speed"); Select or Download Code

Replies are listed 'Best First'.
Re: use File::Slurp for (!"speed"); by rob_au (Abbot) on Nov 08, 2004 at 10:23 UTC
You may be interesting in taking a look at the article "Perl Slurp-Eaze" - http://www.perl.com/pub/a/2003/11/21/slurp.html - which discusses and benchmarks a number of different methods for slurping files. `perl -le "print unpack'N', pack'B32', '00000000000000000000001011101111'"`	[reply]
Re: use File::Slurp for (!"speed"); by Limbic~Region (Chancellor) on Nov 08, 2004 at 13:44 UTC
etcshadow, It appears you quoted the new guy as having said 'way more efficient', but by the rest of the post it appears you intepreted that to mean faster. Personally, I have never heard anyone say File::Slurp is faster. There are many ways to be efficient (memory, CPU, IO, run-time, programming-time, etc). While this sometimes means it results in faster run-times, it is not a requisite for it to be efficient. Efficiency is about getting the biggest return for the smallest investment. It is also necessary to define return and investment since those are things that matter to you. Trading memory for speed wouldn't normally be said to be efficient except that usually speed matters and memory doesn't unless you run out of it. Personally, I would have asked the new guy in what way was it more efficient. It is possible that you did ask, the response was speed, and this rant is warranted - but you failed to include that information. It is also possible that because of the file size, whatever efficiency File::Slurp offered wasn't worth the trade (programmer-time efficiency). Personally, I would like to know why it is considered more efficient as I haven't had time to look at the code myself. Cheers - L~R	[reply] [d/l]
Re: use File::Slurp for (!"speed"); by hardburn (Abbot) on Nov 08, 2004 at 14:31 UTC
Usually when I slurp a file, it's not for speed. It's because the file format is easier to deal with if I grab it all at once. For instance, right now I'm building a module that has a fairly complex mini-language for its configuration file. I'm certain there won't be any speed benefit to slurping this file (a typical application will read the config exactly once at startup), but it's easier to feed Parse::RecDescent data that way. "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.	[reply]
Re: use File::Slurp for (!"speed"); by revdiablo (Prior) on Nov 08, 2004 at 18:20 UTC
reading a file smaller than a disk block (because, hell: we're not using slurping for any kind of sizable file... who would be doing that and still be concerned about performance?) I think there is some area between a file "smaller than a disk block" and "any kind of sizable file." You only test one end of the spectrum, and nowhere in the middle. Let's see how File::Slurp holds up with a bit more testing. First, the benchmark code: `use File::Slurp; use Benchmark qw(cmpthese); cmpthese(-2, { fs => sub { $x = read_file "foo" }, is => sub { $x = is("foo"); } }); sub is { local (@ARGV, $/) = $_[0]; <> };` [download] Note, I put the idiomatic slurp into a subroutine. Much of File::Slurp's compared performance hit was due to Perl's slow subroutine calling. I thought it would be a more apples-to-apples comparison to add that hit for the idiomatic slurp. Feel free to post results with the idiomatic slurp inlined, if you want, but all they will do is change the point where File::Slurp overtakes the IS. Now, some test runs: `$ perl -e 'print "x"x500' > foo $ perl benchmark Rate fs is fs 30210/s -- -33% is 44886/s 49% -- $ perl -e 'print "x"x5_000' > foo $ perl benchmark Rate fs is fs 27499/s -- -26% is 37057/s 35% -- $ perl -e 'print "x"x50_000' > foo $ perl benchmark Rate is fs is 11275/s -- -14% fs 13094/s 16% -- $ perl -e 'print "x"x500_000' > foo $ perl benchmark Rate is fs is 277/s -- -15% fs 325/s 17% -- $ perl -e 'print "x"x5_000_000' > foo $ perl benchmark Rate is fs is 29.5/s -- -17% fs 35.6/s 21% --` [download] As we can see, the idiomatic slurp is faster for the 500 and 5,000 byte files, but once we get into the 50,000 range, File::Slurp takes the lead. You may consider 50k to be too big to slurp, but I certainly do not. In fact, I can even imagine circumstances where slurping the 5mb file would be reasonable. Don't get me wrong. I'm not trying to disagree with your main point. In your particular case, switching to File::Slurp was probably not the right idea. But I certainly can see the case for slurping files that are 50k or more, and in this case, File::Slurp is faster.	[reply] [d/l] [select]
Re^2: use File::Slurp for (!"speed"); by etcshadow (Priest) on Nov 08, 2004 at 22:11 UTC
OK, let's not arbitrarily turn something into a function call that doesn't have to be, and, more importantly, let's recognize the overhead of loading a module: use Benchmark qw(cmpthese); cmpthese(-2, { fs => sub { delete $INC{'File/Slurp.pm'}; require File::Slurp; + File::Slurp->import(); $x = read_file("foo") }, is => sub { $x = do { local (@ARGV, $/) = "foo"; <> }; } }); --- $ perl5.6.0 test.pl Benchmark: running fs, is, each for at least 2 CPU seconds... fs: 3 wallclock secs ( 2.08 usr + 0.11 sys = 2.19 CPU) @ 26 +6.67/s (n=584) is: 2 wallclock secs ( 1.16 usr + 0.86 sys = 2.02 CPU) @ 47 +311.39/s (n=95569) Rate fs is fs 267/s -- -99% is 47311/s 17642% [download] And, yes, I am being a little tongue in cheek with that... but the point is: benchmarks are what you make of them. `------------ :Wq Not an editor command: Wq` [download]	[reply] [d/l] [select]
Re^3: use File::Slurp for (!"speed"); by revdiablo (Prior) on Nov 08, 2004 at 22:58 UTC
let's not arbitrarily turn something into a function call that doesn't have to be I didn't turn it into a function call arbitrarily. In fact, I explained my reasoning for doing so. I was attempting to compare the speed of File::Slurp to the idiomatic slurp. I was not attempting to measure the speed of Perl's subroutine calls, much less loading of modules. It's already well-established that these things are slow. This is how I see things so far: you stated that File::Slurp is slower than the idiomatic slurp. I demonstrated that it is, but only under very constrained circumstances. Then, you constrained the circumstances even further. I don't get the point you're going for here...	[reply]
Re^4: use File::Slurp for (!"speed"); by rcaputo (Chaplain) on Nov 09, 2004 at 20:15 UTC
Re^5: use File::Slurp for (!"speed"); by revdiablo (Prior) on Nov 09, 2004 at 22:07 UTC
Re: use File::Slurp for (!"speed"); by ysth (Canon) on Nov 08, 2004 at 19:22 UTC
To summarize revdiablo's results: if it's going to take more than 1/30000th of a second, File::Slurp is faster; otherwise the idiomatic slurp may be faster. Sounds like the OP here caught a serious case of premature optimization (ironically, while somewhat arguing against it). There are other ways to measure "efficiency". For instance, I know the difference between $/ and $\, but I'm still prone to mistype one for the other perhaps as much as 1% of the time. The time it takes to identify and correct this is a huge inefficency; just using File::Slurp eliminates this problem, at least in the case of file slurping.	[reply]
Re^2: use File::Slurp for (!"speed"); by YuckFoo (Abbot) on Nov 10, 2004 at 03:56 UTC
Were you absent when The parable of the falling droplet was told? YuckFoo	[reply]
Re: use File::Slurp for (!"speed"); by diotalevi (Canon) on Nov 08, 2004 at 19:50 UTC
I use File::Slurp because then I can write `read_file( ... )` instead of `do { local @ARGV = ...; local $/; scalar <> ]`. The former is easier to read. I also don't typically write my own read_file because when I do, it often comes out to about a dozen lines and I'd prefer to let modules handle plumbing tasks that would otherwise just be clutter. I also share ysth's inability to remember which of $/ and $\ controls the EOL for readline and the EOL for print.	[reply] [d/l] [select]


P is for Practical
	PerlMonks