Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

problem with getting web source

by WojciechGajewski (Initiate)
on Mar 24, 2013 at 17:02 UTC ( #1025162=perlquestion: print w/ replies, xml ) Need Help??
WojciechGajewski has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I want to get the source of the webpage based on given URL so I do it the following way:
#!/usr/bin/perl use LWP::Simple; $url = get 'http://www.DevDaily.com/';
But somehow it works fine for most of the pages, but for a few it does not work at all. An example would be the following address: http://www.gutenberg.org/files/10002/10002.txt Why cannot I get the source code? How can I modify my perl code to get the source? Is it something with changing ports? Thanks in advance, Wojciech.

Comment on problem with getting web source
Download Code
Re: problem with getting web source
by Corion (Pope) on Mar 24, 2013 at 17:04 UTC

    How does it fail for you?

    Does the URL work from the browser?

    Have you tried error diagnosis by switching from LWP::Simple to LWP::UserAgent and looking at the error returned?

      Yes, it works fine from the browser. You can check it yourself. Just click the gutenberg link. The failure means instead of the source page I am getting an empty string.

        Just because it works for me, from my browser, does not mean that it will work for you, from your browser.

        I suggest you now use LWP::UserAgent to do the URL downloading. That way, you get more detailed error information instead of just a pass/fail status.

Re: problem with getting web source
by BrowserUk (Pope) on Mar 24, 2013 at 17:32 UTC

    They are rejecting you ( ERROR 403: Forbidden.) because of the user agent string; use one that resembles a real browser.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1025162]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2014-07-29 18:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (226 votes), past polls