Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: How to deal with the fact that Perl is not releasing memory

by carlbolduc (Novice)
on Jul 06, 2013 at 15:45 UTC ( [id://1042921]=note: print w/replies, xml ) Need Help??


in reply to Re: How to deal with the fact that Perl is not releasing memory
in thread How to deal with the fact that Perl is not releasing memory

I will investigate potential memory leaks.

***EDIT*** Here is the complete code:

package CommunicationStream; use Dancer ':syntax'; use Dancer::Plugin::Ajax; use URI::Escape; use WWW::Mechanize; use HTML::TreeBuilder; use HTML::FormatText; use DateTime; use DateTime::Format::Strptime; use SOAP::Lite; our $VERSION = '0.1'; $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; # SOAP Lite configuration #my $soap = SOAP::Lite... my $firstResult = 0; get '/:ref' => sub { my @messages = (); my ($modifiedDate, $quickViewLinks, $linkToDocuments, $outlookProt +ocolLinks, $containsAttachment) = getCachedDocumentUris(params->{ref} +, 0, 5); #fix this to use the number of documents my $i = 0; foreach my $quickViewLink (@{$quickViewLinks}) { my $text = grab_page($quickViewLink); my $datetime = getDisplayDate(${$modifiedDate}[$i]); my ($date, $timestamp) = split(/\|/, $datetime); if (index($text, "class=\"email") != -1) { if (${$containsAttachment}[$i] eq "true") { push(@messages, {date => $date, timestamp => $timestamp, l +ink => "<a href=\"" . ${$outlookProtocolLinks}[$i] . "\"><img src=\"/ +images/emailInbound.gif\" /></a>", text => $text, attachment => "<img + src=\"/images/emailHasAttach.gif\" />"}); } else { push(@messages, {date => $date, timestamp => $timestamp, + link => "<a href=\"" . ${$outlookProtocolLinks}[$i] . "\"><img src=\ +"/images/emailInbound.gif\" /></a>", text => $text}); } } elsif (index($text,"||") != -1) { my @linkAndText = split(/\|\|/, $text); push(@messages, {date => $date, timestamp => $timestamp, l +ink => $linkAndText[0], text => $linkAndText[1]}); } else { push(@messages, {date => $date, timestamp => $timestamp, l +ink => "<a href=\"" . ${$linkToDocuments}[$i] . "\"><img src=\"/image +s/comment_icon.gif\" /></a>", text => $text}); } $i++; } template 'index' => {messages => \@messages}; }; ajax '/getNextResults:firstResultAndRef' => sub { my $content = ''; my ($salesforceRef, $firstResult) = split(/\|/, params->{firstResu +ltAndRef}); my @messages = (); my ($modifiedDate, $quickViewLinks, $linkToDocuments, $outlookProtoc +olLinks) = getCachedDocumentUris($salesforceRef, $firstResult, 5); my $i = 0; foreach my $quickViewLink (@{$quickViewLinks}) { my $text = grab_page($quickViewLink); my $datetime = getDisplayDate(${$modifiedDate}[$i]); my ($date, $timestamp) = split(/\|/, $datetime); if (index($text, "class=\"email") != -1) { push(@messages, {date => $date, timestamp => $timestamp, l +ink => "<a href=\"" . ${$outlookProtocolLinks}[$i] . "\"><img src=\"/ +images/emailInbound.gif\" /></a>", text => $text}); } elsif (index($text,"||") != -1) { my @linkAndText = split(/\|\|/, $text); push(@messages, {date => $date, timestamp => $timestamp, l +ink => $linkAndText[0], text => $linkAndText[1]}); } else { push(@messages, {date => $date, timestamp => $timestamp, l +ink => "<a href=\"" . ${$linkToDocuments}[$i] . "\"><img src=\"/image +s/comment_icon.gif\" /></a>", text => $text}); } $i++; } $content = template 'messages' => {messages => \@messages}; { content => $content } ; }; ajax '/getAllResults:firstResultAndRef' => sub { my $content = ''; my ($salesforceRef, $firstResult) = split(/\|/, params->{firstResu +ltAndRef}); my @messages = (); my ($modifiedDate, $quickViewLinks, $linkToDocuments, $outlookProtoc +olLinks) = getCachedDocumentUris($salesforceRef, $firstResult, 1000); my $i = 0; foreach my $quickViewLink (@{$quickViewLinks}) { my $text = grab_page($quickViewLink); my $datetime = getDisplayDate(${$modifiedDate}[$i]); my ($date, $timestamp) = split(/\|/, $datetime); if (index($text, "class=\"email") != -1) { push(@messages, {date => $date, timestamp => $timestamp, l +ink => "<a href=\"" . ${$outlookProtocolLinks}[$i] . "\"><img src=\"/ +images/emailInbound.gif\" /></a>", text => $text}); } elsif (index($text,"||") != -1) { my @linkAndText = split(/\|\|/, $text); push(@messages, {date => $date, timestamp => $timestamp, l +ink => $linkAndText[0], text => $linkAndText[1]}); } else { push(@messages, {date => $date, timestamp => $timestamp, l +ink => "<a href=\"" . ${$linkToDocuments}[$i] . "\"><img src=\"/image +s/comment_icon.gif\" /></a>", text => $text}); } $i++; } $content = template 'messages' => {messages => \@messages}; { content => $content } ; }; sub getResults { my ($query, $firstResult, $numberOfResults) = ($_[0], $_[1], $_[2] +); my @params = (SOAP::Data->name("p_Params"=> \SOAP::Data->value(SOAP::Data-> +name("BasicQuery" => $query) -> type("string"), SOAP::Data-> +name("NumberOfResults" => $numberOfResults), SOAP::Data-> +name("SortCriteria" => "ModifiedDateDescending"), SOAP::Data-> +name("NeedCachedDocumentUris" => "true"), SOAP::Data-> +name("FirstResult" => $firstResult), SOAP::Data-> +name("TimeZoneOffset" => 4), SOAP::Data-> +name("NeededFields" => + \SOAP::Data->name("string" => "\@sysoutlookuri"))))); my $result = $soap->call($method => @params); return $result; } sub getCachedDocumentUris { my @URIs; my ($salesforceRef, $firstResult, $numberOfResults) = ($_[0], $_[1 +], $_[2]); my $SearchResults = getResults("$salesforceRef \@sysfiletype==(exc +hangemessage,SFCaseComment)", $firstResult, $numberOfResults); my @modifiedDate = $SearchResults->valueof('//PerformQueryResponse +/PerformQueryResult/Results/QueryResult/ModifiedDate'); my @tmpURIs = $SearchResults->valueof('//PerformQueryResponse/Perf +ormQueryResult/Results/QueryResult/CachedDocumentUri'); my @clickableURIs = $SearchResults->valueof('//PerformQueryRespons +e/PerformQueryResult/Results/QueryResult/TargetUri'); my @outlookURIs = $SearchResults->valueof('//PerformQueryResponse/ +PerformQueryResult/Results/QueryResult/Fields/ResultField/Value'); my @containsAttachment = $SearchResults->valueof('//PerformQueryRe +sponse/PerformQueryResult/Results/QueryResult/ContainsAttachment'); foreach my $URI (@tmpURIs) { my ($LeftURI, $RemainingURI) = split('&docid=', $URI); my ($MiddleURI, $RightURI) = split('&q=', $RemainingURI); $MiddleURI =~ s/:/%3A/g; $MiddleURI =~ s/@/%40/g; $MiddleURI =~ s/\//%2F/g; $MiddleURI =~ s/\$/%24/g; $MiddleURI =~ s/é/%C3%A9/g; $MiddleURI =~ s/\+/%2b/g; $MiddleURI =~ s/Messages%2bTrait%C3%A9s/Messages\+Trait%C3%A9s +/g; #push (@URIs, uri_escape($URI)); push (@URIs, $LeftURI . '&docid=' . $MiddleURI . '&q=' . $Righ +tURI); } return (\@modifiedDate, \@URIs, \@clickableURIs, \@outlookURIs, \@ +containsAttachment); } sub grab_page { my @SupportAgents = ("Carl Bolduc", "Second Dude", "Third Dude"); # using shift to accept parameter passed to method my $urlString = shift; my $username = "domain\\user"; my $password = "great password"; my $mech = WWW::Mechanize->new(autocheck => 0); $mech->agent('Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/ +20100101 Firefox/4.0.1'); $mech->credentials($username, $password); my $response = $mech->get($urlString); my $page_contents = $mech->content(); my $Format = HTML::FormatText->new(leftmargin => 0); my $TreeBuilder = HTML::TreeBuilder->new; $TreeBuilder->parse($page_contents); my $Parsed = $Format->format($TreeBuilder); my (@EmailsEnglish, @EmailsFrench); my $FirstEmail; # is this an email exchange if (index($Parsed,"From:") != -1) { @EmailsEnglish = split(/From:/,$Parsed); $FirstEmail = $EmailsEnglish[0]; $FirstEmail =~ s/\n/<br>/g; $FirstEmail =~ s/-+\s?Original Message\s?-+//g; $FirstEmail =~ s/(<br>){3,}/<br>/g; return "<div class=\"email\">$FirstEmail</div>"; } elsif (index($Parsed,"De:") != -1) { @EmailsFrench = split(/De:/,$Parsed); $FirstEmail = $EmailsFrench[0]; $FirstEmail =~ s/\n/<br>/g; return "<div class=\"email\">$FirstEmail</div>"; } # is this a mantis exchange elsif (index($Parsed,"================") != -1) { my $LastNote = substr($Parsed,rindex($Parsed,"---------------- +------------------------------------------------------",rindex($Parse +d, "----------------------------------------------------------------- +-----") -1)); #$LastNote =~ s/-{11,}//g; $LastNote = substr($LastNote,0,index($LastNote,"===========")) +; my $LastNoteAuthor = substr($LastNote,0,rindex($LastNote,"---- +------------------------------------------------------------------")) +; $LastNoteAuthor =~ s/-{11,}//g; # Extract the link to the mantis comment my $MantisLink = ""; if ($LastNoteAuthor =~ m/http:\/\/mantis\/view.php\?id=\d*#c\d +*/) { $MantisLink = $&; } # Detect if Dev or Support is speaking foreach (@SupportAgents) { if (index($LastNoteAuthor,$_) != -1) { $LastNoteAuthor = "Support:"; last; } else { $LastNoteAuthor = "Dev:"; } } my $LastNoteMessage = substr($LastNote,rindex($LastNote,"----- +-----------------------------------------------------------------")); $LastNoteMessage =~ s/-{11,}//g; $LastNoteMessage =~ s/\n/<br>/g; $LastNoteMessage =~ s/(<br>){3,}/<br>/g; return "<a href=\"" . $MantisLink . "\" target=\"_blank\"><img + src=\"/images/mantis.gif\" /></a>||<div class=\"mantis\">" . $LastNo +teAuthor . "<br>" . $LastNoteMessage . "</div>"; #return "<div class=\"mantis\">" . $LastNoteAuthor . "<br>" . +$LastNoteMessage . "</div>"; } else { $Parsed =~ s/\n/<br>/g; $Parsed =~ s/(<br>){3,}/<br>/g; #salesforce comment return "<div class=\"comment\">$Parsed</div>"; } } sub getDisplayDate { my $modifiedDate = $_[0]; #"20090103 12:00"; my $format = new DateTime::Format::Strptime( pattern => '%Y-%m-%dT%H:%M:%S', time_zone => 'GMT', ); my $date = $format->parse_datetime($modifiedDate); #print $date->strftime("%Y%m%d %H:%M %Z")."\n"; $date->set_time_zone("America/New_York"); return $date->day_abbr().", ".$date->month_abbr()." ".$date->strft +ime("%d|%H:%M"); } true;

My understanding was that each Ajax request was reassigning the variables. Reading all the comments here, I am definitely doing something wrong.

Thank you kcott.

Replies are listed 'Best First'.
Re^3: How to deal with the fact that Perl is not releasing memory
by kcott (Archbishop) on Jul 07, 2013 at 07:56 UTC

    Firstly, I'm not a Dancer user. What follows is just a suggestion for something to try; it may not help at all!

    In each of your coderefs (get ... sub { ...} [1 instance] and ajax ... sub { ...} [2 instances]), you have code like this:

    my @messages = (); ... template '...' => {messages => \@messages};

    When they (implicitly) return, @messages will not be garbage-collected while the reference (\@messages) still exists.

    I don't know why that reference might still exist. I followed it through Dancer (1.3116) source:

    sub template { Dancer::Template::Abstract->template(@_) }

    and then through Dancer::Template::Abstract (1.3116) source:

    sub template { my ($class, $view, $tokens, $options) = @_; ...

    but then, not knowing what templating engine you were using, was unable to continue. I'll leave that as an exercise for you.

    You can ensure @messages is garbage-collected by using this code instead (which contructs an anonymous arrayref):

    my @messages = (); ... template 'XXX' => {messages => [ @messages ]};

    Expanding @messages will incur some overhead but that appears to be very small. Also, you might just be deferring your memory leak problem.

    Be aware that memory may not be leaked on every cycle: a linear increase in memory usage may not be occurring at all. It might happen in code that's only run conditionally or perhaps as a result of some exception.

    -- Ken

Re^3: How to deal with the fact that Perl is not releasing memory
by Anonymous Monk on Jul 06, 2013 at 21:02 UTC
    Everything about the code you posted looks fine -- I'm not seeing any stray variables or anything and everything looks self-contained. You need to post more.

    How large can the result of grab_page() be? Expect your peak memory use to be that number multiplied by 3 or so. (Even then it should not slow the script down.)

    One bit that looks wasteful are these parts that cause a fairly large string copy (I think):

    $content = template 'messages' => {messages => \@messages}; { content => $content }
    Throw away the temporary variable and I don't think it'll fix your problem but it might slow down the memory bloat a bit:
    { content => template( 'messages' => {messages => \@messages} ) }

      I edited my reply to include the additional subs.

      grab_page() can sometimes fetch a humongous email thread. It is also called several hundreds of times a day at work.

      A bit of background... Our search engine indexes the emails that we exchange with our clients. Doing so, it also generates a HTML version of each email. By querying the search engine with the support case number (I'm a support agent), I can get all the exchanges of a support ticket in the correct order. By fetching the HTML version and by extracting the first part of each email thread, I can reconstitute a perfect flow of communication. This is very useful when grabbing the case of another support agent for example.

      After watching the video mentioned by Dave, it seems that all the memory used by those ajax calls will remain into malloc and not returned to the OS.

      Am I missing something?

        Anything allocated will not be released to the operating system. See e.g. http://c-faq.com/malloc/free2os.es.html. The exception to this is mmapped memory, typically used for fast file access.

        I suspect your primary problem is the actual parsing of the humongous HTML pages. It just takes a lot of memory -- but the memory should still be freed and reused by the Perl interpreter. But I found this choice quote from HTML::TreeBuilder:

        4. previous versions of HTML::TreeBuilder required you to call $tree->delete() to erase the contents of the tree from memory when you're done with the tree. This is not normally required anymore. See "Weak References" in HTML::Element for details.

        One general technique for keeping the memory use to a minimum is to fork off a child for every (large) request, and pass the bare-minimum result (here, @messages) to the parent. It's far from elegant and relatively difficult to implement, and the peak memory use will still be the same -- just not in the parent process. A last-resort option.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1042921]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-03-19 05:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found