Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

XML::LibXML memory leak

by spstansbury (Monk)
on Dec 07, 2010 at 18:08 UTC ( [id://875849]=perlquestion: print w/replies, xml ) Need Help??

spstansbury has asked for the wisdom of the Perl Monks concerning the following question:

Greetings!

I have a script that processes the output of a vulnerability scanner. When I have a CVE identifier, I look up the CVSS base metrics from the NVD files (nvd.nist.gov).

Everything works just fine, but as I process thousands of records, I run out of memory, top shows an ever incrementing VSIZE...

The issue is in this block of code, as I can set the $cve_id to skip this else clause and the script trundles along happily.

I know that I am declaring a new "my parser=", etc. everytime, how can I make sure the data structure are deleted/torn down?</p

else { # Take the CVE indentifier and get the CVSS vectors # parse the CVE identifier to determine what data file to search my @record_fields = split( /:/, $cve_id ); $cve_id = $record_fields[1]; $cve_id =~ s/^\s+//; $cve_id =~ s/\s+$//; my @id_fields = split( /-/, $cve_id ); my $year = $id_fields[1]; if ($year < 2003) { $data_file = "$nvd_files/nvdcve-2.0-2002.xml"; } else { $data_file = "$nvd_files/nvdcve-2.0-" . $year . ".xml"; } # Parse the data file: my $cve_parser = XML::LibXML->new(); my $cve_doc = $cve_parser->parse_file( $data_file ); my $cve_xc = XML::LibXML::XPathContext->new( $cve_doc->documentEle +ment() ); # Register the namespaces: $cve_xc->registerNs( def => 'http://scap.nist.gov/schema/feed/vuln +erability/2.0' ); $cve_xc->registerNs( vuln => 'http://scap.nist.gov/schema/vulnerab +ility/0.4' ); $cve_xc->registerNs( cvss => 'http://scap.nist.gov/schema/cvss-v2/ +0.2' ); # Find the appropriate CVE entry in the data source: for my $entry ($cve_xc->findnodes("/def:nvd/def:entry[\@id = '$cve +_id']")) { if (my ($metrics) = $cve_xc->findnodes('vuln:cvss/cvss:base_me +trics', $entry)) { $av = $cve_xc->find('cvss:access-vector', $metrics); $ac = $cve_xc->find('cvss:access-complexity', $metrics); $au = $cve_xc->find('cvss:authentication', $metrics); $ci = $cve_xc->find('cvss:confidentiality-impact', $metric +s); $ii = $cve_xc->find('cvss:integrity-impact', $metrics); $ai = $cve_xc->find('cvss:availability-impact', $metrics); } else { $av = ""; $ac = ""; $au = ""; $ci = ""; $ii = ""; $ai = ""; } } }

As always, thanks for any and all help!

Scott

Replies are listed 'Best First'.
Re: XML::LibXML memory leak
by ikegami (Patriarch) on Dec 07, 2010 at 22:22 UTC
    The tree can't be freed while you're still using it with $ac, $au, $ci, $ii and $ai.
      It probably could if you detach $ac/$au/$ci/$ii/$ai, perhaps with
      sub XML::LibXML::Node::detach { my( $self ) = @); $self->parent->removeChild( $self ); }

        There's no reason to go messing with someone else's namespace.

        sub detach { my( $node ) = @_; $node->parentNode->removeChild( $node ); } detach($node);

        would work just as well.

        Except it's not enough. It won't separate it from the document.

        $ perl -MXML::LibXML -E' my $node; { my $xml = "<root><foo><bar/></foo></root>"; my $doc = XML::LibXML->new->parse_string($xml); ($node) = $doc->findnodes("//bar"); $node->parentNode->removeChild($node); } { my $doc = $node->ownerDocument; say "owner=", $doc; if ($doc) { say $_->nodeName for $doc->findnodes("//*"); } } ' owner=XML::LibXML::Document=SCALAR(0x817bcb8) root foo

        You need to give the node a new document.

        $ perl -MXML::LibXML -E' my $foster_home = XML::LibXML::Document->new("1.0", "UTF-8"); my $node; { my $xml = "<root><foo><bar/></foo></root>"; my $doc = XML::LibXML->new->parse_string($xml); ($node) = $doc->findnodes("//bar"); $node->setOwnerDocument($foster_home); } { my $doc = $node->ownerDocument; say "owner=", $doc; if ($doc) { say $_->nodeName for $doc->findnodes("//*"); } } ' owner=XML::LibXML::Document=SCALAR(0x832af38)

        Note that transfers the node's children too.

Re: XML::LibXML memory leak
by Jenda (Abbot) on Dec 12, 2010 at 12:02 UTC

    Use XML::Twig or XML::Rules and process the file in chunks. By the time your loop runs it'a already too late and the memory has already been wasted.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://875849]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-04-18 15:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found