Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^7: Consumes memory then crashs

by BrowserUk (Pope)
on Mar 24, 2012 at 23:12 UTC ( #961448=note: print w/ replies, xml ) Need Help??


in reply to Re^6: Consumes memory then crashs
in thread Consumes memory then crashs

The problem here is that print is not atomic ... Here's a script that provokes it:

Hm. And the fix is sooo complicated:

use strict; use threads; use threads::shared; my $sem :shared; open my $fh, '>', 'outfile' or die $!; my $th = 0; my @threads = map { $th++; async( sub { sleep(1); for(1 .. 30_000) { lock $sem; print $fh "Th +read $th\n" } } ); } (1 .. 500); $_->join foreach @threads; close $fh;

A whole 3 lines.

Regarding modules to facilitate the implementation of said state machine, one I found easy to use (actually the only one I've ever used in production code) is POE::Component::Client::UserAgentPOE::Component::Client::HTTP. (edited, it's been a while but the name didn't sound quite right) POE is rather heavyweight though (not that it mattered much here) so AnyEvent::Curl::Multi might be worth a look too.

Okay, so where's the code? How about you run what you brung?

Betcha don't!

And if you do, betcha it takes you 10 times longer to write; requires 10 times as much (user) code; requires 20 times as many support modules that accumulate to be 30 times as much actual non-core code to trust the authors of and require outside support for when it goes wrong; and finally, runs slower and less efficiently than the 5 minute-to-write, 30 line threaded script above.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?


Comment on Re^7: Consumes memory then crashs
Download Code
Re^8: Consumes memory then crashs
by zwon (Monsignor) on Mar 25, 2012 at 05:34 UTC
    Okay, so where's the code?

    You just had to follow the link provided by davido to find an example. It took me 5 minutes to change it, despite I never used Mojo before:

    use 5.010; use strict; use warnings; use Mojo::UserAgent; use Mojo::IOLoop; use Mojo::URL; # FIFO queue my @names = qw(zezima fred bill john jack); # User agent following up to 5 redirects my $ua = Mojo::UserAgent->new( inactivity_timeout => 1 ); sub url_for_name { my $name = shift; return "http://rscript.org/lookup.php?type=track&time=62899200&use +r=$name&skill=all"; } # Crawler my $crawl; $crawl = sub { my $id = shift; return unless my $name = shift @names; say "Looking for $name"; # Fetch non-blocking just by adding a callback $ua->get( url_for_name($name) => sub { my ( $ua, $tx ) = @_; my $body = $tx->res->body; if ( $body =~ m/gain:Overall:\d+:(\d+)/i ) { say "$name $1"; } elsif ( $body =~ m/(ERROR)/i ) { say "$name doesn't exist"; } else { say "$name 0"; } # Next $crawl->($id); } ); }; # Start a bunch of parallel crawlers sharing the same user agent $crawl->($_) for 1 .. 4; # Start reactor Mojo::IOLoop->start;
    And if you do, betcha it takes you 10 times longer to write; requires 10 times... less efficiently...

    And that is just rubbish, especially talking about efficiency. I handled hundreds of simultaneous connections with AnyEvent::HTTP, and it didn't really consume a lot of CPU or memory, with threads it would went to swap.

      You installed a "real time web framework" -- which looking at cpan consists of anything upto 200 modules installs 270 modules -- in order to download a few files.

      That's just dumb!


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        I know Chuck Moore. Chuck Moore's a great guy.

        Your Chuck Moore impression could use some work.

        Seems some people need an explanation for why it is dumb.

        If you prefer to use an event driven method for performing the OPs task, there is absolutely no logic in installing a 270 module "real time web framework" (nor even a 150+ module POE) to do it.

        When LWP::ParallelUA will do the job with just 9 files:

        #! perl -slw use strict; use LWP::Parallel::UserAgent; use Time::HiRes qw[ time ]; sub lookup { my( $lookup, $resp, $prot ) = @_; my( $name ) = ${ $resp->{ _request }{ _uri } } =~ m[&user=([^&]+)& +]; print "Response for $name"; if( $lookup =~ m/gain:Overall:\d+:(\d+)/isg ) { print "$name $1"; } elsif( $lookup =~ m/(ERROR)/isg ) { print "$name doesn't exist " } else{ print "$name 0"; } } my @names = do{ local @ARGV = 'firstnames.txt'; <>}; #qw(zezima fred b +ill john jack); chomp @names; my $start = time; my $pua = LWP::Parallel::UserAgent->new(); $pua->timeout( 10 ); $pua->max_req( $ARGV[ 0 ] // 10 ); $pua->register( HTTP::Request->new( 'GET', "http://rscript.org/lookup.php?type=track&time=62899200&user=$ +{_}&skill=all" ), \&lookup ) for @names; my $entries = $pua->wait; warn time - $start; __END__ c:\test>PUA-getnames >nul 91.8953080177307 at C:\test\PUA-getnames.pl line 44, <> line 501. c:\test>PUA-getnames 20 >nul 82.6376140117645 at C:\test\PUA-getnames.pl line 44, <> line 501.

        S'no quicker than threads and not that much smaller, but it does at least work!


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re^8: Consumes memory then crashs
by mbethke (Hermit) on Mar 25, 2012 at 07:24 UTC

    No need to get ironic---my point was merely that threaded code is hard to get right, even for a good and experienced programmer. Your first example illustrated that well, and so does your fix. Here's a comparison of runtimes:

    $ time perl threads_yours.pl real 7m13.312s user 2m1.484s sys 10m8.982s $ time perl threads_mine.pl real 0m2.209s user 0m6.028s sys 0m0.604s

    And here's a bit of the output:

    $ uniq -c outfile | head 2275 Thread 1 1 ThreaThread 2 454 Thread 2 1 TThread 3 909 Thread 3 1 ThThread 5 454 Thread 5 1 Td 1 1364 Thread 1 1 Thread 1hread 2

    This is a bog-standard Perl 5.10.1 on AMD64/i7 as it comes with Debian Squeeze¹. Either you overlooked yet another pitfall (I can't tell what it is---my point again: it looks deceptively simple but ain't) or the library implementation is buggy, either way its behavior is certainly more correct than my example's at the cost of being almost 200 times slower, but not quite correct yet.

    As for the POE code, it took me indeed some 15 minutes to write:

    #!/usr/bin/perl use strict; use POE qw(Component::Client::HTTP); use HTTP::Request; my @names = qw/ zezima fred bill john jack /; open my $fh, '>', 'outfile' or die $!; sub start_req { my $name = shift @names or return; POE::Kernel->post(weeble => request => response => HTTP::Request->new( GET => "http://rscript.org/lookup.php?type=track&time=6289 +9200&user=$name&skill=all" ), $name ); } POE::Session->create( inline_states => { _start => sub { POE::Component::Client::HTTP->spawn; start_req for(1 .. 5); }, response => sub { my $name = $_[ARG0]->[1]; my $result = $_[ARG1]->[0]{_content}; if($result =~ m/gain:Overall:\d+:(\d+)/isg) { print { $fh } "$name $1\n"; } elsif($result =~ m/(ERROR)/isg) { print { $fh } "$name doesn't exist \n" } else { print { $fh } "$name 0\n"; } start_req; }, }, ); POE::Kernel->run; close $fh;

    I haven't used that component the last two years, so yeah, I was slow because I had to look up the defaults and how to pass the HTTP::Request object again. As long as an implementation will not trigger my pager by barfing at 4 in the morning and then look all innocent when I try and debug it, I think that's time well spent.

    As for efficiency: no. The 500 do-almost-nothing threads in your code (mine didn't run long enough to register in top) need a resident set of 327 MB here (virtual size is slightly over 4 GB), I didn't try with the web scraper but I don't see how it could do any better. If I start 500 parallel requests in the POE version (well, my line here is 256 kbit on a sunny day during low tide ...), it takes 21 MB as opposed to 18 with five requests. Code-size-wise, all POE modules I have installed together are slightly over 40 kLOC including POD. I'll leave the comparison to Perl's thread code plus pthreads or whatever that builds on on your box to you. My user code is 36 non-empty lines (OK, I cheated you for two lines in the if/elsif block because that matches my style), yours is 42 so far.

    ¹ 5.14.2 on a newer kernel and a Phenom-II shows the same behavior, it only takes 260 MB but is even slower, I stopped it after over 20 minutes of CPU time

      my point was merely that threaded code is hard to get right ...

      If you don't follow the rules, the simplest things can screw up:

      perl -e "fork while fork"
      And here's a bit of the output:

      Don't blame Perl or threads because your C runtime libraries are paying lip service to the realities of concurrency.

      If it doesn't take care of things, you'll have to do it.

      The addition of my $old = select $fh; $|++; select $old; might sort out the interleaving problem.

      And use threads stack_size => 4096; will substantially reduce the memory footprint.

      But that's all irrelevant. Like saying a bendy-bus is 25x better than a family car because it can carry 100 people.

      The 500 do-almost-nothing threads in your code

      Not my code!

      And what is the point of running 500 threads?

      It takes ~40MB to run 4 threads. That's a whole 2% of the ram of the lowest spec commodity box you'll ever find for sale.

      And with 4 threads, it takes:

      c:\test>junk39 -THREADS=4 >nul 51.556654214859 at C:\test\junk39.pl line 55, <> line 1

      51 seconds to pull 501 names.

      With 8 threads:

      c:\test>junk39 -THREADS=8 >nul 49.5222151279449 at C:\test\junk39.pl line 55, <> line 1.

      49 seconds. So doubling the number of threads gained almost nothing. The pipe or the remote server is the limiting factor.

      I'd love to compare like with like, but having installed the 131 file that make up the POE behemoth:

      ppm> install 1 Downloading POE-1.352...done Downloading POE-Test-Loops-1.351...done Unpacking POE-1.352...done Unpacking POE-Test-Loops-1.351...done Generating HTML for POE-1.352...done Generating HTML for POE-Test-Loops-1.351...done Updating files in site area...done 131 files installed

      I was still missing stuff your script needed:

      c:\test>mbethke.pl Can't locate POE/Component/Client/HTTP.pm in @INC (@INC contains: c:/P +erl64/site/lib c:/Perl64/lib .) BEGIN failed--compilation aborted at (eval 33) line 1. could not import qw(Component::Client::HTTP) at C:\test\mbethke.pl lin +e 4 BEGIN failed--compilation aborted at C:\test\mbethke.pl line 4.

      So, then I tried to download that, but one of its dozen or so dependancies was unavailable:

      1: POE-Component-Client-HTTP a HTTP user-agent component Version: 0.945 Released: 2012-03-10 Author: Rocco Caputo <rcaputo@cpan.org> Provide: POE::Component::Client::HTTP version 0.945 Provide: POE::Component::Client::HTTP::Request version 0.945 Provide: POE::Component::Client::HTTP::RequestFactory version 0.945 Provide: POE::Filter::HTTPChunk version 0.945 Provide: POE::Filter::HTTPHead version 0.945 Require: HTTP::Headers version 5.81 or better Require: HTTP::Request version 5.811 or better Require: HTTP::Request::Common version 5.811 or better Require: HTTP::Response version 5.813 or better Require: HTTP::Status version 5.811 or better Require: Net::HTTP::Methods version 5.812 or better Require: POE version 1.312 or better Require: POE::Component::Client::Keepalive version 0.269 or better Require: Socket::GetAddrInfo version 0.19 or better Require: Test::More version 0.96 or better Require: Test::POE::Server::TCP version 1.14 or better Require: URI version 1.37 or better Repo: ActiveState Package Repository CPAN: http://search.cpan.org/dist/POE-Component-Client-HTTP-0.945/ ppm> install 1 ppm install failed: Can't find any package that provides Socket::GetAd +drInfo for POE-Component-Client-HTTP Can't find any package that provides Socket::GetAddrInfo for POE-Compo +nent-Resolver

      It might take a bit less memory, but it certainly won't be quicker because the limitation is the pipe and/or remote server.

      I guess I could try installing the 270 module mojo behemothe, but it never terminates:

      c:\test>zwon >nul Too late to run CHECK block at c:/Perl64/site/lib/EV.pm line 84, <> li +ne 501. Terminating on signal SIGINT(2)

      One way uses 0.5% of my memory; only needs what came installed with my Perl installation; and works.

      The other two require gobs of extra code and either don't run or never finish. The decision is an easy one for me.

      All that's left for me to do is free up about 50 MB of space on my harddrive by throwing away all the crap installed to write this post:

      ppm> uninstall POE POE: uninstalled ppm> uninstall mojolicious Mojolicious: uninstalled

      There, all done.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        Don't fret over the couple of kilobytes of crap® you had to download. I installed a whole Strawberry Perl just to test your code on Windows :)

        Don't blame Perl or threads because your C runtime libraries are paying lip service to the realities of concurrency.

        Assuming this to be the case, I have bad news for you: so does Windows' C library. Here's a snippet of your "fixed" code's output on the latest Strawberry on XP---the runtime was just as atrocious as on Linux BTW:

        $ uniq -c outfile |head -n10 819 Thread 2 1 ThThread 1 818 Thread 1 1 ThThread 3 818 Thread 3 1 ThThread 4 818 Thread 4 1 ThThread 5 818 Thread 5 1 ThThread 6

        At Thread 20 it gets a bit more irregular ...

        If it doesn't take care of things, you'll have to do it.

        No, you'll have to do it. If you want to show it can be done correctly using threading that is.

        The addition of my $old = select $fh; $|++; select $old; might sort out the interleaving problem.

        That doesn't sound overly confident ...

        And what is the point of running 500 threads?

        Contrary to what you wrote, a thread is not just a means of improving efficiency on multiprocessors. It's just a logical program flow that's supposed to correspond to a certain task, and there are plenty of tasks that could make use of hundreds of threads: simulations, network monitoring, sensor data collection, crawling slow sites, etc. If threading in the "use threads" sense was halfway efficient for it that is.

        It takes ~40MB to run 4 threads. That's a whole 2% of the ram of the lowest spec commodity box you'll ever find for sale.

        You still get 1 GB netbooks, but whatever. So you need only twice the memory to get correct results most of the time, with nondeterministic failures in-between, after the first "Service Pack". At least the other side's IIS doesn't seem to scale either so you don't have to put the thread code to this test. Sorry, even though I have conceded from the start that there are rare cases where threads are the model of choice this is not gonna convince me that we're looking at one.

        As for POE on Windows, on my fresh Strawberry install I can type "cpan POE" and then "cpan POE::Component::Client::HTTP" to end up with a working installation of everything required. Don't blame POE when ActiveState's repository absorbs excrement.

        One way uses 0.5% of my memory; only needs what came installed with my Perl installation; and works. The other two require gobs of extra code and either don't run or never finish. The decision is an easy one for me.

        No, it does not work, at least not reliably. That was the whole point. The failure rate may be acceptable for you but I don't think it's "a simple solution to thread this properly" as the OP asked.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://961448]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (10)
As of 2014-08-29 18:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (286 votes), past polls