Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

LWP and WWW:Mechanize not working

by AI Cowboy (Beadle)
on Jun 04, 2013 at 00:40 UTC ( [id://1036855]=perlquestion: print w/replies, xml ) Need Help??

AI Cowboy has asked for the wisdom of the Perl Monks concerning the following question:

Greetings dear Monks,

I have been trying to, for my work project, get a Perl program that can automatically download all the packages linked to by a single Google web-page. Nothing nefarious of course, but there are literally hundreds of 20-100 MB sized files there, and I don't fancy doing this manually. I've actually been instructed to build such a program.

The program isn't the hard part per se, as I've already completed the task in multiple ways, theoretically. The problem is, when I run even the following test code (with the extra "Use" statements left over from previous attempts):
#!/usr/bin/perl use LWP::UserAgent; use LWP::Simple; use URI::URL; use WWW::Mechanize; use HTML::LinkExtor; my $url = 'http://foo.bar.baz'; getprint('http://foo.bar.baz'); $user = LWP::UserAgent->new(); $user->get($url);

I get the following error in command prompt (I am using Windows 8, don't ask):
500 Status read failed: A non-blocking socket operation could not be c +ompleted i mmediately. <URL:http://foo.bar.baz>

What am I doing wrong with my approach? Is there a way to fix/bypass this? Could a different programming language get the job done? I've gotten the script to download the files successfully (I tried on one of them manually with the script using LWP::Simple to save the file on my disk), but the page that links the downloads is unreadable apparently.

UPDATE: I've tried wget, curl, and a few other things - even a method that worked yesterday to grab a test file off the net, and today the method I used to download a test file off the net with perl, doesn't work.

Every time I use lwp to connect ANYWHERE on the net with perl now, it gives me the "500 Status Read Failed" error, "a non-blocking socket operation could not be completed immediately". I'm completely baffled by this. I can connect to local html files with lwp, but not anything on the internet, and I have no firewalls up.

Replies are listed 'Best First'.
Re: LWP and WWW:Mechanize not working (you think)
by Anonymous Monk on Jun 04, 2013 at 03:06 UTC

    What am I doing wrong with my approach? Is there a way to fix/bypass this?

    Well, the code you posted wouldn't generate the error message you posted, so this is where you're going wrong, try

    mech-dump --links http://www.google.com/googlebooks/uspto-patents-grants-text.html

    OTOH, why are you even bothering to write a program for this, wget/curl/httrack/lwp-rget... already do this

      What do you mean the code have wouldn't generate that output? I ran the code and copied the output directly - that is exactly what I got.

      I'll take a look at the links you provided.

        Yeah, see LWP and WWW:Mechanize not working :) so getprint gave you diagnostic error message ( WSAEWOULDBLOCK 10035 Resource temporarily unavailable ) seems to be working :)

      I tried mech dump, and it gave a nearly identical error message - here it is:

      Error GETing http://www.google.com/googlebooks/uspto-patents-grants-text.html: S tatus read failed: A non-blocking socket operation could not be completed immedi ately. at C:\Perl64\bin/mech-dump line 103.

      Also, I can't seem to locate a way to download curl, wget, or either of the others - can you help me out? Sorry for the kiddy question, I've gotten really used to easy downloads where the download is a big button on the page :P
      oh, didn't see the getprint in there :) OTOH
      $ perl -MLWP::Simple -e " getprint( shift ) " http://foo.bar.baz 500 Can't connect to foo.bar.baz:80 (Bad hostname) <URL:http://foo.bar +.baz>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1036855]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (9)
As of 2024-04-26 08:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found