Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
laziness, impatience, and hubris
 
PerlMonks  

LWP and WWW:Mechanize not working

by AI Cowboy (Sexton)
on Jun 04, 2013 at 00:40 UTC ( #1036855=perlquestion: print w/ replies, xml ) Need Help??
AI Cowboy has asked for the wisdom of the Perl Monks concerning the following question:

Greetings dear Monks,

I have been trying to, for my work project, get a Perl program that can automatically download all the packages linked to by a single Google web-page. Nothing nefarious of course, but there are literally hundreds of 20-100 MB sized files there, and I don't fancy doing this manually. I've actually been instructed to build such a program.

The program isn't the hard part per se, as I've already completed the task in multiple ways, theoretically. The problem is, when I run even the following test code (with the extra "Use" statements left over from previous attempts):
#!/usr/bin/perl use LWP::UserAgent; use LWP::Simple; use URI::URL; use WWW::Mechanize; use HTML::LinkExtor; my $url = 'http://foo.bar.baz'; getprint('http://foo.bar.baz'); $user = LWP::UserAgent->new(); $user->get($url);

I get the following error in command prompt (I am using Windows 8, don't ask):
500 Status read failed: A non-blocking socket operation could not be c +ompleted i mmediately. <URL:http://foo.bar.baz>

What am I doing wrong with my approach? Is there a way to fix/bypass this? Could a different programming language get the job done? I've gotten the script to download the files successfully (I tried on one of them manually with the script using LWP::Simple to save the file on my disk), but the page that links the downloads is unreadable apparently.

UPDATE: I've tried wget, curl, and a few other things - even a method that worked yesterday to grab a test file off the net, and today the method I used to download a test file off the net with perl, doesn't work.

Every time I use lwp to connect ANYWHERE on the net with perl now, it gives me the "500 Status Read Failed" error, "a non-blocking socket operation could not be completed immediately". I'm completely baffled by this. I can connect to local html files with lwp, but not anything on the internet, and I have no firewalls up.

Comment on LWP and WWW:Mechanize not working
Select or Download Code
Re: LWP and WWW:Mechanize not working (you think)
by Anonymous Monk on Jun 04, 2013 at 03:06 UTC

    What am I doing wrong with my approach? Is there a way to fix/bypass this?

    Well, the code you posted wouldn't generate the error message you posted, so this is where you're going wrong, try

    mech-dump --links http://www.google.com/googlebooks/uspto-patents-grants-text.html

    OTOH, why are you even bothering to write a program for this, wget/curl/httrack/lwp-rget... already do this

      What do you mean the code have wouldn't generate that output? I ran the code and copied the output directly - that is exactly what I got.

      I'll take a look at the links you provided.

        Yeah, see LWP and WWW:Mechanize not working :) so getprint gave you diagnostic error message ( WSAEWOULDBLOCK 10035 Resource temporarily unavailable ) seems to be working :)

      oh, didn't see the getprint in there :) OTOH
      $ perl -MLWP::Simple -e " getprint( shift ) " http://foo.bar.baz 500 Can't connect to foo.bar.baz:80 (Bad hostname) <URL:http://foo.bar +.baz>
      I tried mech dump, and it gave a nearly identical error message - here it is:

      Error GETing http://www.google.com/googlebooks/uspto-patents-grants-text.html: S tatus read failed: A non-blocking socket operation could not be completed immedi ately. at C:\Perl64\bin/mech-dump line 103.

      Also, I can't seem to locate a way to download curl, wget, or either of the others - can you help me out? Sorry for the kiddy question, I've gotten really used to easy downloads where the download is a big button on the page :P

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1036855]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2014-04-17 09:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (443 votes), past polls