How does the while works in case of Filehandle when reading a gigantic file in Perl

raj4489 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl by flexvault (Monsignor) on Jan 30, 2015 at 12:30 UTC
raj4489, You can check this yourself with the following untested code: `use strict; use warnings; use Time:HiRes qw( gettimeofday ); open ( my $fh, "<", "./something.txt" \|\| die "! open $!\n"; my $stime = gettimeofday; while (<$fh>) { } print gettimeofday - $stime, "\n"; close $fh; open ( $fh, "<", ".something.txt" \|\| die "! open $!\n"; $stime = gettimeofday; while (<$fh>) { # do something is your actual code! } print gettimeofday - $stime, "\n"; close $fh;` [download] Now you have the time for the 'while' loop with and without your 'do something' actual code. I suspect your experiential growth is something your doing in the 'do something' part of the script. Post it and we may help improve the process. Regards...Ed "Well done is better than well said." - Benjamin Franklin	[reply] [d/l]
Re^2: How does the while works in case of Filehandle when reading a gigantic file in Perl by raj4489 (Acolyte) on Feb 06, 2015 at 12:39 UTC
@perlholic has posted my code, and I have checked the timings for each step separately but all are linear. The problem lies with 'while' because for every new line the time requirements go up	[reply]
Re^3: How does the while works in case of Filehandle when reading a gigantic file in Perl by Corion (Patriarch) on Feb 06, 2015 at 12:43 UTC
There is nothing particular with a `while(<>){` loop that would make the time go up linearly. Maybe some of your processing is consuming memory or accumulating data in an array that gets larger and larger without ever getting cleared. We will need to see more accurate code than the reduced version that was posted. If the time taken per line gets larger and larger, maybe you can post a small/short XML example and some code more to the point so we can try to replicate the problem? Please also make sure that the problem appears with the code you post. If you do other work, like for example, inserting the data into a database instead of writing it to a file, that could get slower with each new row that gets added.	[reply] [d/l]
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl by Robidu (Acolyte) on Jan 30, 2015 at 12:05 UTC
The time used up is likely depend on your {do something} - the while loop itself resumes reading from your file at the very spot the previous read has stopped, up until an EOF is encountered. I therefore suspect that the loop's body somehow causes things to bog down - but in order to shed more light on the issue, I would have to know exactly what's going on in {do something}...	[reply]
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl by reisinge (Hermit) on Jan 30, 2015 at 14:17 UTC
Since `while` evaluates the condition in scalar context and the line input operator (`<>`) returns the next line of input (or `undef` on end-of-file) in scalar context, that code processes file line by line. On the other hand, when <> is evaluated in list context, it returns a list of all lines (they all go the memory!) from file. `while (<$fh>) { # Read line by line; generally preferred way (especi +ally for large files) print "The line is: $_"; } foreach (<$fh>) { # Read all lines at once print "The line is: $_"; }` [download] One of my favorite quotes	[reply] [d/l] [select]
Re^2: How does the while works in case of Filehandle when reading a gigantic file in Perl by sundialsvc4 (Abbot) on Jan 30, 2015 at 15:00 UTC
`/me nods ...` A very compelling explanation for “exponentially slower” would be that this program is, indeed, reading the entire file into memory and trying to process it that way. The process’s working set tries to become larger than the file itself, and the memory-manager of the system can&rquo;t accommodate that, and what happens is that the system starts “thrashing.” If you graph the performance-curve of that, it is “exponential.” If a file is indeed being read linearly, without stuffing it all into memory, then the completion time profile ought to be linear: the “lines processed per millisecond” should be more or less constant, and the working-set size of the process (as seen by `top` or somesuch) should not vary according to the size of the file being processed. If the file is twice as big, it should take about twice as long, all other things being equal. So, if that is not what is seen to be happening, “a great big slurp” is almost certain to be the explanation, and the good news is that it should be quite easy to fix.
Re^3: How does the while works in case of Filehandle when reading a gigantic file in Perl by Anonymous Monk on Jan 30, 2015 at 15:22 UTC
"a great big slurp" is almost certain to be the explanation raj4489 said "I use while for reading it line by line".	[reply]
Re^4: How does the while works in case of Filehandle when reading a gigantic file in Perl by CountZero (Bishop) on Jan 30, 2015 at 20:48 UTC
Re^5: How does the while works in case of Filehandle when reading a gigantic file in Perl by sundialsvc4 (Abbot) on Jan 30, 2015 at 22:14 UTC
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl by perloHolic() (Beadle) on Jan 30, 2015 at 14:13 UTC
I believe this to be a duplicated question on stackoverflow, in which case I believe this to be the code inside the {do something..} found on stackoverflow...Hopefully having the actual code within your loop posted here may help you get an answer to your problem, hope this helps... $line=0; %values; open my $fh1, '<', "file.xml" or die $!; while (<$fh1>) { $line++; if ($_=~ s/foo//gi) { chomp $_; $values{'id'} = $_; } elsif ($_=~ s/foo//gi) { chomp $_; $values{'type'} = $_; } elsif ($_=~ s/foo//gi) { chomp $_; $values{'pattern'} = $_; } if (keys(%values) == 3) { open FILE, ">>temp.txt" or die $!; print FILE "$values{'id'}\t$values{'type'}\t$values{'pattern'}\n"; close FILE; %values = (); } if($line == ($line1+1000000)) { $line1=$line; $read_time = time(); $processing_time = $read_time - $start_time - $processing_time; print "xml file parsed till line $line, time taken $processing_tim +e sec\n"; } } [download]	[reply] [d/l]
Re^2: How does the while works in case of Filehandle when reading a gigantic file in Perl by CountZero (Bishop) on Jan 30, 2015 at 19:56 UTC
You are opening and closing your output file for every line you read. Opening a file is timewise an "expensive" operation. As the file is opened for appending new lines to it, opening the file will take longer and longer, since the OS has to find the end of file to add the new line to it. Solution: open the output file in the beginning of your script and close it at the end. You should see an immediate speed improvement. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply]
Re^3: How does the while works in case of Filehandle when reading a gigantic file in Perl by raj4489 (Acolyte) on Feb 06, 2015 at 12:36 UTC
I have checked the timings for all the steps individually, and no step in the while loop takes anymore time (it remains constant). But only when getting in the while, the time keeps on increasing for every new line.	[reply]
Re^4: How does the while works in case of Filehandle when reading a gigantic file in Perl by CountZero (Bishop) on Feb 06, 2015 at 19:45 UTC
Re^2: How does the while works in case of Filehandle when reading a gigantic file in Perl by AnomalousMonk (Archbishop) on Jan 30, 2015 at 16:03 UTC
perloHolic(): Can you post a link to the SO question so we can follow along? (This is really the responsibility of raj4489, but he or she is new to this site and may be unfamiliar with the ins-and-outs of PM and/or SO.) Give a man a fish: `<%-(-(-(-<`	[reply] [d/l]
Re^3: How does the while works in case of Filehandle when reading a gigantic file in Perl by perloHolic() (Beadle) on Jan 30, 2015 at 16:32 UTC
Sure, no problem. http://stackoverflow.com/questions/28234374/how-does-the-while-works-in-case-of-filehandle-when-reading-a-gigantic-file-in-p my apologies if there is a proper way to format this link, i'm not exactly an expert at this posting practice yet either	[reply]
Re^4: How does the while works in case of Filehandle when reading a gigantic file in Perl by AnomalousMonk (Archbishop) on Jan 30, 2015 at 17:27 UTC
Re^2: How does the while works in case of Filehandle when reading a gigantic file in Perl by raj4489 (Acolyte) on Feb 06, 2015 at 12:34 UTC
Thanks for posting my code	[reply]
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl by LanX (Saint) on Jan 30, 2015 at 12:04 UTC
Depends, have a look at `seek` and `tell` Or maybe you want to read in larger chunks... ? Cheers Rolf PS: Je suis Charlie! PS: > Does while has to parse through all the lines it has already read to go to the next unread line or something like that? Not if you open the file just once!	[reply]
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl by sandy105 (Scribe) on Jan 30, 2015 at 11:30 UTC
Does while has to parse through all the lines it has already read to go to the next unread line or something like that? to answer that it does not !.if you making one pass thru the file it should not so much unless it a really huge file. since no more info is there , i can suggest you use {do something } or next; -- to skip lines not needed	[reply]
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl by pme (Monsignor) on Jan 30, 2015 at 12:01 UTC
Hi raj4489, welcome to the monastery! Could you share your real source? Especially what does it do with variable $_?	[reply]


No such thing as a small change
	PerlMonks