Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Memory Leak HTML::FormatText

by Anonymous Monk
on Sep 13, 2013 at 12:03 UTC ( #1053903=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Enlightened Ones,

I have a memory leak on the following code (run on ActiveState Perl and Windows 7). The problem seems to be with the call to HTML::Format. After the end of the loop there still seem to be variables containing text, etc. rather than it all getting properly scoped out (and hence the memory being returned). There is an earlier node addressing a similar problem (Memory Leak? i'm clueless.), but the solution mentioned there (insert a delete command for the input file) does not seem to work for me.

I had a look at the nodes discussing memory leaks in general, but frankly did not find them that useful since they all mention tools such as Devel::Peek, Devel::Cycle... - but I cannot find a description of these tools that I as a newbie can understand.

Any help would be appreciated!
use warnings; use strict; use diagnostics; use HTML::FormatText; open INPUT, "< D:/htmladdresses.txt" or die "Problem: $!"; our @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); chomp $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); # this is followed by some regular expression, all disabled now }

Comment on Memory Leak HTML::FormatText
Download Code
Re: Memory Leak HTML::FormatText
by CountZero (Bishop) on Sep 13, 2013 at 20:19 UTC
    When the loop finishes, your program finishes and all memory will be given back to the operating system. How can there be a memory leak?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Memory Leak HTML::FormatText
by kcott (Abbot) on Sep 14, 2013 at 02:03 UTC
    "The problem seems to be with the call to HTML::Format. After the end of the loop there still seem to be variables containing text, etc. rather than it all getting properly scoped out (and hence the memory being returned)."

    You're using package variables (INPUT and @INPUT). Their scope is the entire package (i.e. main) and they will persist until the script ends. See "perlmod - Perl modules (packages and symbol tables)".

    What you probably want is lexical variables (see my). Try writing your code along these lines:

    my $file = 'D:/htmladdresses.txt'; open my $input_fh, '<', $file or die "Can't open '$file' for reading: $!"; while (my $input = <$input_fh>) { chomp $input; # Use $input here as you were previously using it } close $input_fh;

    See also: perlsub, perldata, our, local and open.

    "I had a look at the nodes discussing memory leaks in general, but frankly did not find them that useful since they all mention tools such as Devel::Peek, Devel::Cycle... - but I cannot find a description of these tools that I as a newbie can understand."

    You may find Test::LeakTrace is a little easier to use.

    -- Ken

      Thank you for your reply.

      The first part of the code is at the highest level of the program, so @input is used throughout the while loop, which basically is the whole program. The first part just opens a file with website-addresses which get loaded into @input and then the loop cycles through them.

      In any case, I have now changed the code to the following:

      use warnings; use strict; use diagnostics; use HTML::FormatText; use HTML::TreeBuilder 5 - weak; open INPUT, "< D:/websitelocations.txt" or die "Problem: $!"; my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); chomp $input; print $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); # followed by regular expressions, the results of which are saved pri +nted into a new file, all of this is currently disabled }

      The memory leak is still there though. It runs out of memory after about 3000 files, but I have more than 28 000.

      I still think it has something to do with HTML::FormatText. I read elsewhere that this calls HTML::Treebuilder which in the past has caused memory leaks when the object was not explicitly deleted. I added now

      use HTML::TreeBuilder 5 - weak;

      which should take care of it according to CPAN documentation on HTML::TreeBuilder. However, apparently it does not. I also tried to add explicit calls to the delete function:

      $content->delete(new)

      As well as

      $input->delete(new)

      But this just gives me an error message: can't locate object message

        What is  $content->delete(new) supposed to be or do (what is the string "new") ?

        Nevermind

        here is my tip, do a Data::Dumper of an object afer one or 10 files, and look for references

        Note especially the bless'ed package names

        Then go write some destructors, its what I did for bugs in HTML::TableExtract/HTML::TableExtract Memory Usage

        Since you're using sub format_file { I'd copy/paste its source and Dumper the objects involved to find circular-references $VAR1 = { ... \$VAR1 };

      Thank you for pointing out Test::LeakTrace. I have tried this now, using the code from the CPAN example, but I do not receive any report.

      use warnings; use strict; use diagnostics; use HTML::FormatText; use HTML::TreeBuilder 5 - weak; use constant HAS_LEAKTRACE => eval{ require Test::LeakTrace }; use Test::More HAS_LEAKTRACE ? (tests => 1) : (skip_all => 'require Te +st::LeakTrace'); use Test::LeakTrace; leaks_cmp_ok{ open INPUT, "< D:/websiteadresses.txt" or die "Problem: $!"; #The file contains 28000 addresses of websites (each one on a new line +) my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); chomp $input; print $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); # This is followed by regular expressions, the results of which are s +aved in a new file; all of this is disabled now. } } '<', 1;

      The program just runs out of memory after about 3000 runs through the while-loop, but the program output is not followed by any report from LeakTrace. I also did not see a reference to a file in which the report is saved, etc. on the CPAN documentation for Leaktrace. Not sure what to do now...

      I am running this in ActiveState's Komodo, but there is not output from Leaktrace either in the internal output window in Komodo nor in the external shell

        LeakTrace: I now found the following comment in the overall output (just not at the very end, where I expected it):

        "Looks like your test exited with 1 before it could output anything"

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1053903]
Approved by Corion
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (12)
As of 2014-12-19 20:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (91 votes), past polls