Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

How does the while works in case of Filehandle when reading a gigantic file in Perl

by raj4489 (Acolyte)
on Jan 30, 2015 at 11:15 UTC ( [id://1115048]=perlquestion: print w/replies, xml ) Need Help??

raj4489 has asked for the wisdom of the Perl Monks concerning the following question:

I have a very large file to read, so when I use while for reading it line by line, the script starts taking more time to read the line as I dig deep in the file; and to mention the rise is exponential.

while (<$fh>) {do something}

Does while has to parse through all the lines it has already read to go to the next unread line or something like that? How can I overcome such a situation?

EDIT 1

@perlholic has posted my code

And I want to add that i have tested the timing for each step individually but all are linear except when I read anew line from the file

Thanks in advance

  • Comment on How does the while works in case of Filehandle when reading a gigantic file in Perl
  • Download Code

Replies are listed 'Best First'.
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by flexvault (Monsignor) on Jan 30, 2015 at 12:30 UTC

    raj4489,

    You can check this yourself with the following untested code:

    use strict; use warnings; use Time:HiRes qw( gettimeofday ); open ( my $fh, "<", "./something.txt" || die "! open $!\n"; my $stime = gettimeofday; while (<$fh>) { } print gettimeofday - $stime, "\n"; close $fh; open ( $fh, "<", ".something.txt" || die "! open $!\n"; $stime = gettimeofday; while (<$fh>) { # do something is your actual code! } print gettimeofday - $stime, "\n"; close $fh;
    Now you have the time for the 'while' loop with and without your 'do something' actual code.

    I suspect your experiential growth is something your doing in the 'do something' part of the script. Post it and we may help improve the process.

    Regards...Ed

    "Well done is better than well said." - Benjamin Franklin

      @perlholic has posted my code, and I have checked the timings for each step separately but all are linear. The problem lies with 'while' because for every new line the time requirements go up

        There is nothing particular with a while(<>){ loop that would make the time go up linearly. Maybe some of your processing is consuming memory or accumulating data in an array that gets larger and larger without ever getting cleared. We will need to see more accurate code than the reduced version that was posted.

        If the time taken per line gets larger and larger, maybe you can post a small/short XML example and some code more to the point so we can try to replicate the problem? Please also make sure that the problem appears with the code you post.

        If you do other work, like for example, inserting the data into a database instead of writing it to a file, that could get slower with each new row that gets added.

Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by Robidu (Acolyte) on Jan 30, 2015 at 12:05 UTC
    The time used up is likely depend on your {do something} - the while loop itself resumes reading from your file at the very spot the previous read has stopped, up until an EOF is encountered. I therefore suspect that the loop's body somehow causes things to bog down - but in order to shed more light on the issue, I would have to know exactly what's going on in {do something}...
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by reisinge (Hermit) on Jan 30, 2015 at 14:17 UTC

    Since while evaluates the condition in scalar context and the line input operator (<>) returns the next line of input (or undef on end-of-file) in scalar context, that code processes file line by line.

    On the other hand, when <> is evaluated in list context, it returns a list of all lines (they all go the memory!) from file.

    while (<$fh>) { # Read line by line; generally preferred way (especi +ally for large files) print "The line is: $_"; } foreach (<$fh>) { # Read all lines at once print "The line is: $_"; }

      /me nods ...

      A very compelling explanation for “exponentially slower” would be that this program is, indeed, reading the entire file into memory and trying to process it that way.   The process’s working set tries to become larger than the file itself, and the memory-manager of the system can&rquo;t accommodate that, and what happens is that the system starts “thrashing.”   If you graph the performance-curve of that, it is “exponential.”

      If a file is indeed being read linearly, without stuffing it all into memory, then the completion time profile ought to be linear:   the “lines processed per millisecond” should be more or less constant, and the working-set size of the process (as seen by top or somesuch) should not vary according to the size of the file being processed.   If the file is twice as big, it should take about twice as long, all other things being equal.   So, if that is not what is seen to be happening, “a great big slurp” is almost certain to be the explanation, and the good news is that it should be quite easy to fix.

        "a great big slurp" is almost certain to be the explanation

        raj4489 said "I use while for reading it line by line".

Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by perloHolic() (Beadle) on Jan 30, 2015 at 14:13 UTC

    I believe this to be a duplicated question on stackoverflow, in which case I believe this to be the code inside the {do something..} found on stackoverflow...Hopefully having the actual code within your loop posted here may help you get an answer to your problem, hope this helps...

    $line=0; %values; open my $fh1, '<', "file.xml" or die $!; while (<$fh1>) { $line++; if ($_=~ s/foo//gi) { chomp $_; $values{'id'} = $_; } elsif ($_=~ s/foo//gi) { chomp $_; $values{'type'} = $_; } elsif ($_=~ s/foo//gi) { chomp $_; $values{'pattern'} = $_; } if (keys(%values) == 3) { open FILE, ">>temp.txt" or die $!; print FILE "$values{'id'}\t$values{'type'}\t$values{'pattern'}\n"; close FILE; %values = (); } if($line == ($line1+1000000)) { $line1=$line; $read_time = time(); $processing_time = $read_time - $start_time - $processing_time; print "xml file parsed till line $line, time taken $processing_tim +e sec\n"; } }
      You are opening and closing your output file for every line you read.

      Opening a file is timewise an "expensive" operation. As the file is opened for appending new lines to it, opening the file will take longer and longer, since the OS has to find the end of file to add the new line to it.

      Solution: open the output file in the beginning of your script and close it at the end. You should see an immediate speed improvement.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics

        I have checked the timings for all the steps individually, and no step in the while loop takes anymore time (it remains constant). But only when getting in the while, the time keeps on increasing for every new line.

      perloHolic(): Can you post a link to the SO question so we can follow along? (This is really the responsibility of raj4489, but he or she is new to this site and may be unfamiliar with the ins-and-outs of PM and/or SO.)


      Give a man a fish:  <%-(-(-(-<

      Thanks for posting my code

Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by LanX (Saint) on Jan 30, 2015 at 12:04 UTC
    Depends, have a look at seek and tell

    Or maybe you want to read in larger chunks...

    ?

    Cheers Rolf

    PS: Je suis Charlie!

    PS:

    > Does while has to parse through all the lines it has already read to go to the next unread line or something like that?

    Not if you open the file just once!

Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by sandy105 (Scribe) on Jan 30, 2015 at 11:30 UTC

    Does while has to parse through all the lines it has already read to go to the next unread line or something like that?

    to answer that it does not !.if you making one pass thru the file it should not so much unless it a really huge file. since no more info is there , i can suggest you use {do something } or next; -- to skip lines not needed

Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by pme (Monsignor) on Jan 30, 2015 at 12:01 UTC
    Hi raj4489, welcome to the monastery!

    Could you share your real source? Especially what does it do with variable $_?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1115048]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2024-04-25 12:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found