Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

CPU Leaking Threads

by Trizor (Pilgrim)
on Feb 21, 2007 at 22:21 UTC ( #601441=perlquestion: print w/replies, xml ) Need Help??

Trizor has asked for the wisdom of the Perl Monks concerning the following question:


I have written a centralized tracker for several datasources, and it gets updated by tcp/ip connections. It then outputs an XML file if the update is of note, otherwise simply caches the xml in an in memory representation. The problem isn't memory usage, we knew it was going to be large in respect to memory from the get go. The problem is as this program runs it begins to gobble more and more cpu. Before I killed it it went from 100% to 171% cpu usage in about a minute.

Here is the shell of it, the inner workings are proprietary.

Do any more enlightened monks know why its CPU usage would just go up up up when there was no information being sent to it?

Update: Arch and build might be relevant
 perl, v5.8.5 built for i386-linux-thread-multi
use strict; use warnings; use threads qw(yield); use threads::shared; use Thread::Queue; use IO::Socket; use Net::hostent; use XML::Parser; use XML::XPath; # server and client handle declaration # Also summons listener socket my ($server,$client); $server = IO::Socket::INET->new( Proto => 'tcp', LocalPort => PORT, Listen => SOMAXCONN, Type => SOCK_STREAM, Reuse => 1) or die "MAIN: Ca +n't bind socket: $!"; #DoneFlag, raised when the XML File is done my $DoneFlag : shared; #Timeout flag, raised upon timing out my $Timeout : shared; #Data Queue, this is the queue that pulled in client data goes into #Spawn Process Child my $child = threads->new(\&Process,$DataQueue); #Spawn Timeout timer and detach my $timeoutTime = time + $opt_timeout; my $timer = threads->create(\&Timer,$timeoutTime); $timer->detach(); #Spawn Main reaper and detach my $reap = threads->create(\&MainReaper); $reap->detach(); my $DataQueue = Thread::Queue->new; #The main loop, while we accept from the server socket. print "MAIN: Looking.\n"; LOOK: while ($client = $server->accept()) { #Check for timeout if ($Timeout) { print "MAIN: Timed out.\n"; last LOOK; } #Check for done if ($DoneFlag) { print "MAIN: Child said done.\n"; last LOOK; } # Spawn a parser child to handle the connection. print "MAIN: Spawning ParserChild.\n"; my $unimpChild = threads->new(\&Parse,$client,$DataQueue); print "MAIN: Done Listening. Cleaning up.\n"; #Join every thread to cleanup foreach my $thr (threads->list()) { if ($thr->tid && !threads::equal($thr, threads->self) && !threads::e +qual($thr,$child) && !threads::equal($thr,$timer)) { $thr->join; } } #Kill Process with the terminating undef, then join $DataQueue->enqueue(undef); my $ret = $child->join(); #Close the server socket close $server; print "MAIN: Done Looking.\n"; #Here ends the main loop sub Process { my $queue = shift; my $dat; # Set up XML::XPath to run in a creation mode with a base of <base/> my $parser = XML::XPath::XMLParser->new(xml=>qq|<base/>|); my $xp = XML::XPath->new(); my $xmlRoot = $parser->parse; my $docRoot = $xp->find(q{/base},$xmlRoot)->shift(); # Dequeue data until its undef while ($dat = $queue->dequeue()) { #do some preprocessing, create @data from $dat # For every param passed by a parser foreach (@data) { last unless /\S/; #We're done if its blank #parse out data to place into XML #uses the SWITCH: { /foo/ && do BLOCK } construct to do this #Make sure we didn't get garbage if(defined($parsedFoo)) { #in here the following xml calls are made: # exists, if it returns true: # find, getAttribute, setAttribute 4 times # else # XML::XPath::Node::Element->new() # setAttribute 5 times # appendChild } else { warn "BAD DATA"; } } continue { open XML,"> $fileloc" or die "FATAL ERROR CANNOT XML OUTPUT"; print XML q|<?xml version="1.0"?>|.$xmlRoot->toString; close XML; # A done check goes here, basically we do 2 xpath queries and if + they return > a certain number of nodes we're done and we exit this +thread. } } return; } sub Parse { sub ParseFL { my $sock = shift; my $Dqueue = shift; my $ret; # Read all the data the socket has to say while(<$sock>) { last unless /\S/; #Blank line means done chomp; #Strip the newline # Pre-pre processing and addition to $ret } close $sock; $Dqueue->enqueue($ret) if $ret; # Send it on to Process if it exists return; } sub Timer { my $outtime = shift; my $difference = $outtime - time(); print "Timer: timing until now is $outtime.\n"; $difference = $outtime - time(),threads->yield() until($difference < += 0 || $DoneFlag); unless($DoneFlag) { $Timeout = 1; print "Timer: Setting Timeout Flag.\n"; } } sub MainReaper { #Wait threads->yield() until ($DoneFlag || $Timeout); #Gank print "MainReaper: Reaping the main loop with a blank packet.\n"; $reap = IO::Socket::INET->new( Proto => 'tcp', PeerAddr => 'localhos +t', PeerPort => '77777'); }

Replies are listed 'Best First'.
Re: CPU Leaking Threads
by BrowserUk (Pope) on Feb 21, 2007 at 22:39 UTC

    Change the yield in your MainReaper thread to:

    sleep 1 until ($DoneFlag || $Timeout);

    Yield just relinguishes the rest of the threads timeslice. It immediately becomes eligible for rescheduling. That means that if no other thread is available to run, you've coded a very tight loop that will happily consume as much cpu as is available. By using a sleep, you will avoid thrashing the cpu. If you need finer than 1 second resolution, use Time::HiRes::sleep or the 4-arg select.

    Note: This is based upon my observations of what happens on Win32. Your OS may work differently.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      And in reply to myself, the same change should be made in the Timer routine.
        the same change should be made in the Timer routine.

        Yes indeed. I didn't notice the second yield() tucked away there in that compound statement.

        Out of interest. Did those changes fix the problem? Are you happy with the performance and function of the resulting code?

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: CPU Leaking Threads
by renodino (Curate) on Feb 22, 2007 at 01:36 UTC
    What version of threads are you using ? Have you installed the latest from CPAN (*many* fixes and features have been applied since Perl 5.8.5's core threads release).

    That said, I'd suggest you revisit your architecture. Obviously, there are many pieces I'm not privy to, but your current code is very non-deterministic, and I'd consider it a bit dangerous to put into production. Have you considered combining the Parse and Process inside individual threads, with a lock around the output file, and creating a fixed size pool of threads executing the resulting method ? You'd have to pass the new sockets to the threads as fileno's and reconstitute them via fdopen(), but that's still likely to be faster than constantly spawning/destroying new threads, and certainly more manageable/debuggable. It may not solve your immediate issue, but I suspect you'll find the cause faster.

    Interestingly, a recent journal pointed to some related threads links, esp. the unfortunately titled 'The Problem With Threads'. In brief, nondeterminism is bad, and in your case, possibly unsafe.

    One more suggestion: try monitoring with Devel::STrace. I used it to diagnose a similar problem with a highly threaded app...turned out to be some unexpected behavior when the socket peer dropped the connection unexpectedly. (However, its also likely sensitive to your transient thread architecture).

    One last hint: assuming you use a newer version of threads, you may want to adjust the thread stacksize (depending on the platform) to help reduce footprint.

    Perl Contrarian & SQL fanboy

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://601441]
Approved by Paladin
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2021-06-13 09:25 GMT
Find Nodes?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)

    Results (54 votes). Check out past polls.