Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Slurp a file

by dkubb (Deacon)
on Jan 15, 2001 at 13:46 UTC ( #51885=note: print w/ replies, xml ) Need Help??


in reply to Slurp a file

My favorite way of reading the contents of a filehandle into a scalar is:

sysread FILEHANDLE, my $file_contents, -s FILEHANDLE;

In benchmarks I've done this is the fastest and most elegant way that I've found.


Comment on Re: Slurp a file
Download Code
(because I can) Re (2): Slurp a file
by mwp (Hermit) on Jan 15, 2001 at 15:58 UTC
    Modifying jeroenes's modification of davorg's modified benchmark (threw in tye's neat trick for good measure):
    Rate join argv hybrid linesep sysread join 5459/s -- -44% -71% -81% -91% argv 9690/s 78% -- -49% -67% -85% hybrid 19084/s 250% 97% -- -34% -70% linesep 29070/s 433% 200% 52% -- -55% sysread 64103/s 1074% 562% 236% 121% --

    Nice, dkubb! I think we have a winner. Not only is _sysread() clearly the fastest, but it burns the least CPU*. I like it! Hopefully there are no "catches."

    update: _argv() is almost as fast as _linesep() when working with large files. _join() and _hybrid() tend to fall behind.

    * Linux 2.4, Celeron-366, Perl 5.6.0


    #!/usr/bin/perl -w use strict; use Benchmark qw(cmpthese); my $file = $0; open(IN, $file) or die "$file: $!\n"; cmpthese(50_000, { join => \&_join, linesep => \&_linesep, hybrid => \&_hybrid, sysread => \&_sysread, argv => \&_argv }); close(IN); sub _join { seek(IN, 0, 0); my $content = join '', <IN>; } sub _linesep { seek(IN, 0, 0); my $content = do { local $/; <IN> } } sub _hybrid { seek(IN, 0, 0); my $content = do { local $/; join '', <IN> } } sub _sysread { seek(IN, 0, 0); sysread IN, my $content, -s IN } sub _argv { my $content = do { local(*ARGV, $/); @ARGV = ($file); <> } }
(tye)Re5: Slurp a file
by tye (Cardinal) on Jan 15, 2001 at 21:19 UTC

    Unfortunately, this isn't completely portable. There are minor (as far as I can tell) problems with some operating systems where the size of a file on disk doesn't always match the size of data read into memory.

    The big problems have to do with the great many types of filehandles where -s can't tell the file size.

    But, yes, when it works, this is a neat trick. Thanks.

            - tye (but my friends call me "Tye")
      Just a few questions:

      What types of filehandlers would not return the correct size with -s?

      And do you know a better way to get the size of a filehandle than using -s?

        What types of filehandlers would not return the correct size with -s?

        Mostly pipes and user input devices (terminals/consoles). -s pretty much only works on file handles connected to oridinary files. For the other cases, the only way to determine the amount of data before EOF is to read it all.

                - tye (but my friends call me "Tye")

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://51885]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2014-07-31 21:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (253 votes), past polls