Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^3: Crawling with Parallel::ForkManager

by tokpela (Chaplain)
on Aug 08, 2009 at 08:55 UTC ( #787014=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Crawling with Parallel::ForkManager
in thread Crawling with Parallel::ForkManager

Just a guess here...

Have you tried to download the PDF using the $mech connection you are already using? Say using:

$mech->get($url_to_pdf); $mech->save_content( $filename );

Maybe this is a cookie issue. I believe that $mech will accept cookies by default. This might mean that using a separate mirror process causes a different connection to take place and the web server maybe does not allow a direct connection from that page without a cookie.

It might work for you in the browser since your browser would already have a cookie.


Comment on Re^3: Crawling with Parallel::ForkManager
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://787014]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2015-07-04 22:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls