Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: Crawling with Parallel::ForkManager

by tokpela (Chaplain)
on Aug 08, 2009 at 08:55 UTC ( #787014=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Crawling with Parallel::ForkManager
in thread Crawling with Parallel::ForkManager

Just a guess here...

Have you tried to download the PDF using the $mech connection you are already using? Say using:

$mech->get($url_to_pdf); $mech->save_content( $filename );

Maybe this is a cookie issue. I believe that $mech will accept cookies by default. This might mean that using a separate mirror process causes a different connection to take place and the web server maybe does not allow a direct connection from that page without a cookie.

It might work for you in the browser since your browser would already have a cookie.


Comment on Re^3: Crawling with Parallel::ForkManager
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://787014]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2015-01-30 21:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My top resolution in 2015 is:

















    Results (254 votes), past polls