Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Massive Memory Leak

by martin_ldn (Initiate)
on Dec 08, 2009 at 10:51 UTC ( #811707=note: print w/replies, xml ) Need Help??

in reply to Massive Memory Leak

Hey all,

Thanks for the responses :-) I have identified the problem! It was with HTML::Parse. Instead I have used HTML::Strip and this not only fixes the memory leak but is also massively faster.

Thank you for pointing out the other bits too. As I say I am new to Perl so they are very helpful. I will look into the security problems but as the machine is completely isolated (no network connection) I'm not sure this is an immediate problem. I will bear it in mind though. And graff I like that loop more - looks more elegant!


Replies are listed 'Best First'.
Re^2: Massive Memory Leak
by afoken (Abbot) on Dec 08, 2009 at 13:03 UTC

    Not using placeholders is not only a security problem. When you use placeholders, you allow DBI, DBD::whatever, and the database to cache a parsed form of your query. This can speed up things dramatically, even with simple SQL statements.

    And you can get completely rid of any quoting problems for values you want to pass to the database. Use a placeholder and pass the actual value to execute(), no matter what it contains. You don't even have to know what quoting rules apply to your database.

    Background information:
    For most databases, the DBD can pass SQL statement and values separately to the database, so even the DBD does not need to know quoting rules. The database can cache a precompiled version of the query, and needs to parse the query only once, no matter how often you use it. For those unlucky databases that do not support placeholders, the DBD provides all required quoting rules, and DBI and DBD take care of injecting properly quoted values into the query. At this point, at least DBI and DBD can cache a precompiled version of the query, so DBI and DBD are still more efficient in that worse case than your code. And because a lot of the DBI/DBD code is written in C / XS, it is usually much faster that everything you can code in perl.

    Oh, and by the way: What happens if one of the values you want to insert contains a single quote? Right, your code dies, because you do not quote properly. If you still insist on quoting your values manually, at least use DBIs quote method to quote the values properly.


    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      There are some fair points there. To get around the ' (or \) problem I simply replaced the character with a - beforehand, which was a workaround. I have now looked at the DBI docs and have modified my program to be much better in terms or architecture and elegance. I have attached it in case it is of help to someone else :-) Check it out!

      #!/usr/bin/perl use HTML::TableContentParser; use HTML::Strip; use DBI; use strict; use warnings; # Connect to database and create parser object my $db = DBI->connect ("DBI:mysql:newsbms","newsbms", "newsbms", { RaiseError => 1, PrintError => 0}); for my $path( 'modified', 'deleted' ) { print "\nProcessing the '$path' entries...\n\n"; # Create counters my $counter = 0; my $query_counter = 0; # Open the directory my $dirname = "/home/martinn/monitoring/newsBMS/$path/"; opendir(DIR, $dirname) || die ("Could not open $dirname"); # Prepare the MySQL statement my $query = "INSERT INTO"; if ($path eq 'modified') { $query = $query . " modified (id, name, title, duration, library, modified, user, rev) VALUES ( ?, ?, ?, ?, ?, ?, ?, ? )"; } if ($path eq 'deleted') { $query = $query . " deleted (name, title, duration, deleted, library) VALUES ( ?, ?, ?, ?, ? )"; } $query = $query . " ON DUPLICATE KEY UPDATE duplicates=duplicates+ +1"; my $statement = $db->prepare($query); # Loop through all files in the directory while (defined(my $filename = readdir(DIR))) { # Skip special "files": '.' and '..' next if $filename =~ /^\.\.?$/; $counter++; # Open and read the html file into a single string open(HTMLFILE, $dirname.$filename) || die ("Couldn't open $fil +ename"); binmode HTMLFILE; my $html = join("", <HTMLFILE>); close(HTMLFILE); # Parse the html table my $tcp = HTML::TableContentParser->new; my $tables = $tcp->parse($html); # Issue the MySQL queries for my $t (@$tables) { for my $r (@{ $t->{rows} }) { my @values; for my $c (@{ $r->{cells} }) { # Remove the html tags from the cells my $stripper = HTML::Strip->new(); $c->{data} = $stripper->parse($c->{data}); # Add cell to the end of the array push(@values, $c->{data}); } $statement->execute(@values); $query_counter++; # Basic activity monitor if ($query_counter % 5000 == 0) { print "Issued $query_counter MySQL queries.\n"; } } } } # Close the directory closedir(DIR); # Finish the MySQL statement $statement->finish(); print "\nDone the '$path' table.\n"; print "Processed $counter files and issued $query_counter MySQL qu +eries.\n"; } # Disconnect from the database $db->disconnect(); print "\nProgram Finished.\n";

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://811707]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (15)
As of 2019-02-18 18:04 GMT
Find Nodes?
    Voting Booth?
    I use postfix dereferencing ...

    Results (100 votes). Check out past polls.