Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
There are a few odd things about the OP code:
  • Using HTML::Parse appears to be deprecated; the perldoc man page starts with:
    Disclaimer: This module is provided only for backwards compatibility with earlier versions of this library. New code should not use this module, and should really use the HTML::Parser and HTML::Treebuilder modules directly, instead.

  • This line in your first nested for loop seems to invoke a subroutine called "parse_html", which I would expect to turn up as "undefined":
    $c->{data} = $stripper->format(parse_html($c->{data}))

Apart from that, I wouldn't know whether the memory leak is due to the "unfinished" query statement objects, as suggested by others above, or whether it's due to stranded (non-garbage-collected) objects in the HTML parsing/formatting modules.

One way to tease that apart would be to divide the process into two distinct steps (two processes): step/process 1 is to parse the html data and output tab-delimited flat tables that can then be inserted into your database by any of several easy methods. If that process succeeds, you can conclude that it was the database interaction that caused the leak.

In any case, a better way to load large quantities into mysql tables is via the "mysqlimport" tool that comes with mysql -- it's incredibly fast compared to using Perl/DBI for inserts, and it's the best/easiest way to load a table from a tab-delimited flat-text file. (Rather, Perl/DBI is incredibly slow relative to mysqlimport.)

Another idea: since you are looping over two main directories, you might try just doing one directory per run (giving the desired path name on the command line). If you really want to do both in one run (after solving the memory leak issue), a better loop method would be:

for my $path ( 'modified', 'deleted' ) { ... }
instead of that clunky while-loop.

In reply to Re: Massive Memory Leak by graff
in thread Massive Memory Leak by martin_ldn

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others romping around the Monastery: (7)
    As of 2018-02-23 13:25 GMT
    Find Nodes?
      Voting Booth?
      When it is dark outside I am happiest to see ...

      Results (302 votes). Check out past polls.