Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Perl crashing with Parallel::ForkManager and WWW::Mechanize

by BrowserUk (Patriarch)
on Aug 04, 2012 at 12:13 UTC ( [id://985429]=note: print w/replies, xml ) Need Help??


in reply to Perl crashing with Parallel::ForkManager and WWW::Mechanize

Windows & fork are like windows & stones; mix them and some thing's gonna break :)

Try this (slightly tested):

#! perl -slw use strict; use threads; use threads::shared; use LWP::Simple; use HTML::TreeBuilder::XPath; sub locked(\$) :lvalue { lock ${$_[0]}; ${$_[0]} } our $T //= 3; my $stdoutSem :shared; my $running :shared = 0; while( my $url = <> ) { chomp $url; async { ++locked( $running ); if( my $content = get $url ) { my $tree = HTML::TreeBuilder::XPath->new(); $tree->parse( $content ); # do some processing here on the content if( my $title = $tree->findnodes( '/html/head/title' ) ) { chomp $title; lock $stdoutSem; print "$url : $title "; } # once done then delete the root node $tree->delete(); } --locked( $running ); }->detach; Win32::Sleep 500 while $running >= $T; }

And use it like this:

thisScript.pl urls.list > output.file

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^2: Perl crashing with Parallel::ForkManager and WWW::Mechanize
by NeonFlash (Novice) on Aug 05, 2012 at 15:29 UTC

    Thank you.

    So, I tried running my script on Linux Platform (Ubuntu based Distro) and the script ran completely without being killed in between. This is a new concept for me, thanks again :)

    One question though. When a child process is spawned by the Parent Process, do they run on different cores of a processor or the same?

    For instance, if I set, MAX_CHILDREN to 3, so 3 children run together, do they run on the same core of the processor? :)

    Because, I have a quad core machine, so wanted to know if increasing the value of the MAX_CHILDREN setting will help in achieving a better speed?

      When a child process is spawned by the Parent Process

      When you use fork on Windows, you do not create a child process. You spawn a thread within the existing process that simulates forking.

      More generally -- unless you explicitly restrict them -- all threads are eligible to run on all available processors.

      And processes are threads -- on all platforms. Even a single-threaded process, is a thread at the OS level.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        Thanks. How about on the Linux Platform? Since my case the script was failing on Windows but works good on Linux, so there must be some difference in the mechanism. While running the script, I ran the top command and noticed that there are multiple processes for perl each having a unique PID. So, in case of Linux multiple processes are spawned, is that correct?
        I want to know, what are the factors I need to consider while setting the value of MAX_CHILDREN setting in my program. I think it depends on the amount memory available, the processor speed and the Internet Connection speed. What would be a good setting? I am running my script inside an Ubuntu Virtual Machine with 2 Gb of memory allocated to the VM and 2 i7 Processor cores allocated to it. I have set the value for MAX_CHILDREN to 15 at present, and the script seems to run properly. Can I increase it further, for instance to a value like 20? I believe, memory releases happen automatically.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://985429]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-19 01:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found