Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: How does the while works in case of Filehandle when reading a gigantic file in Perl

by reisinge (Hermit)
on Jan 30, 2015 at 14:17 UTC ( [id://1115071]=note: print w/replies, xml ) Need Help??


in reply to How does the while works in case of Filehandle when reading a gigantic file in Perl

Since while evaluates the condition in scalar context and the line input operator (<>) returns the next line of input (or undef on end-of-file) in scalar context, that code processes file line by line.

On the other hand, when <> is evaluated in list context, it returns a list of all lines (they all go the memory!) from file.

while (<$fh>) { # Read line by line; generally preferred way (especi +ally for large files) print "The line is: $_"; } foreach (<$fh>) { # Read all lines at once print "The line is: $_"; }
  • Comment on Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: How does the while works in case of Filehandle when reading a gigantic file in Perl
by sundialsvc4 (Abbot) on Jan 30, 2015 at 15:00 UTC

    /me nods ...

    A very compelling explanation for “exponentially slower” would be that this program is, indeed, reading the entire file into memory and trying to process it that way.   The process’s working set tries to become larger than the file itself, and the memory-manager of the system can&rquo;t accommodate that, and what happens is that the system starts “thrashing.”   If you graph the performance-curve of that, it is “exponential.”

    If a file is indeed being read linearly, without stuffing it all into memory, then the completion time profile ought to be linear:   the “lines processed per millisecond” should be more or less constant, and the working-set size of the process (as seen by top or somesuch) should not vary according to the size of the file being processed.   If the file is twice as big, it should take about twice as long, all other things being equal.   So, if that is not what is seen to be happening, “a great big slurp” is almost certain to be the explanation, and the good news is that it should be quite easy to fix.

      "a great big slurp" is almost certain to be the explanation

      raj4489 said "I use while for reading it line by line".

        Actually sundialsvc4 is close to the truth (but not quite there yet).

        The program repeatedly opens the output file for appending a new line to it. That means that the OS must somehow find the end of the file and whereas that perhaps does not necessarily mean the OS has to "slurp" the whole file, it will have to walk the chain of disk-space to find where is the end of the file, then read the last few sectors in its buffer, append the new line and write out the buffer to disk again (either immediately or upon closing the file). And all this is done for each line added since the program opens and closes the output file for each new line.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1115071]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2024-04-20 02:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found