Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by flexvault (Monsignor) on Jan 30, 2015 at 12:30 UTC
|
use strict;
use warnings;
use Time:HiRes qw( gettimeofday );
open ( my $fh, "<", "./something.txt" || die "! open $!\n";
my $stime = gettimeofday;
while (<$fh>)
{ }
print gettimeofday - $stime, "\n";
close $fh;
open ( $fh, "<", ".something.txt" || die "! open $!\n";
$stime = gettimeofday;
while (<$fh>)
{
# do something is your actual code!
}
print gettimeofday - $stime, "\n";
close $fh;
Now you have the time for the 'while' loop with and without your 'do something' actual code.
I suspect your experiential growth is something your doing in the 'do something' part of the script.
Post it and we may help improve the process.
Regards...Ed
"Well done is better than well said." - Benjamin Franklin
| [reply] [d/l] |
|
| [reply] |
|
There is nothing particular with a while(<>){ loop that would make the time go up linearly. Maybe some of your processing is consuming memory or accumulating data in an array that gets larger and larger without ever getting cleared. We will need to see more accurate code than the reduced version that was posted.
If the time taken per line gets larger and larger, maybe you can post a small/short XML example and some code more to the point so we can try to replicate the problem? Please also make sure that the problem appears with the code you post.
If you do other work, like for example, inserting the data into a database instead of writing it to a file, that could get slower with each new row that gets added.
| [reply] [d/l] |
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by Robidu (Acolyte) on Jan 30, 2015 at 12:05 UTC
|
The time used up is likely depend on your {do something} - the while loop itself resumes reading from your file at the very spot the previous read has stopped, up until an EOF is encountered.
I therefore suspect that the loop's body somehow causes things to bog down - but in order to shed more light on the issue, I would have to know exactly what's going on in {do something}...
| [reply] |
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by reisinge (Hermit) on Jan 30, 2015 at 14:17 UTC
|
Since while evaluates the condition in scalar context and the line input operator (<>) returns the next line of input (or undef on end-of-file) in scalar context, that code processes file line by line.
On the other hand, when <> is evaluated in list context, it returns a list of all lines (they all go the memory!) from file.
while (<$fh>) { # Read line by line; generally preferred way (especi
+ally for large files)
print "The line is: $_";
}
foreach (<$fh>) { # Read all lines at once
print "The line is: $_";
}
| [reply] [d/l] [select] |
|
/me nods ...
A very compelling explanation for “exponentially slower” would be that this program is, indeed, reading the entire file into memory and trying to process it that way. The process’s working set tries to become larger than the file itself, and the memory-manager of the system can&rquo;t accommodate that, and what happens is that the system starts “thrashing.” If you graph the performance-curve of that, it is “exponential.”
If a file is indeed being read linearly, without stuffing it all into memory, then the completion time profile ought to be linear: the “lines processed per millisecond” should be more or less constant, and the working-set size of the process (as seen by top or somesuch) should not vary according to the size of the file being processed. If the file is twice as big, it should take about twice as long, all other things being equal. So, if that is not what is seen to be happening, “a great big slurp” is almost certain to be the explanation, and the good news is that it should be quite easy to fix.
| |
|
| [reply] |
|
|
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by perloHolic() (Beadle) on Jan 30, 2015 at 14:13 UTC
|
I believe this to be a duplicated question on stackoverflow, in which case I believe this to be the code inside the {do something..} found on stackoverflow...Hopefully having the actual code within your loop posted here may help you get an answer to your problem, hope this helps...
$line=0;
%values;
open my $fh1, '<', "file.xml" or die $!;
while (<$fh1>)
{
$line++;
if ($_=~ s/foo//gi)
{
chomp $_;
$values{'id'} = $_;
}
elsif ($_=~ s/foo//gi)
{
chomp $_;
$values{'type'} = $_;
}
elsif ($_=~ s/foo//gi)
{
chomp $_;
$values{'pattern'} = $_;
}
if (keys(%values) == 3)
{
open FILE, ">>temp.txt" or die $!;
print FILE "$values{'id'}\t$values{'type'}\t$values{'pattern'}\n";
close FILE;
%values = ();
}
if($line == ($line1+1000000))
{
$line1=$line;
$read_time = time();
$processing_time = $read_time - $start_time - $processing_time;
print "xml file parsed till line $line, time taken $processing_tim
+e sec\n";
}
}
| [reply] [d/l] |
|
You are opening and closing your output file for every line you read.Opening a file is timewise an "expensive" operation. As the file is opened for appending new lines to it, opening the file will take longer and longer, since the OS has to find the end of file to add the new line to it. Solution: open the output file in the beginning of your script and close it at the end. You should see an immediate speed improvement.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] |
|
I have checked the timings for all the steps individually, and no step in the while loop takes anymore time (it remains constant). But only when getting in the while, the time keeps on increasing for every new line.
| [reply] |
|
|
| [reply] [d/l] |
|
| [reply] |
|
|
| [reply] |
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by LanX (Saint) on Jan 30, 2015 at 12:04 UTC
|
Depends, have a look at seek and tell
Or maybe you want to read in larger chunks...
?
PS:
> Does while has to parse through all the lines it has already read to go to the next unread line or something like that?
Not if you open the file just once! | [reply] |
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by sandy105 (Scribe) on Jan 30, 2015 at 11:30 UTC
|
Does while has to parse through all the lines it has already read to go to the next unread line or something like that?
to answer that it does not !.if you making one pass thru the file it should not so much unless it a really huge file. since no more info is there , i can suggest you use {do something } or next; -- to skip lines not needed
| [reply] |
Re: How does the while works in case of Filehandle when reading a gigantic file in Perl
by pme (Monsignor) on Jan 30, 2015 at 12:01 UTC
|
| [reply] |