WojciechGajewski has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I want to get the source of the webpage based on given URL so I do it the following way:
#!/usr/bin/perl use LWP::Simple; $url = get '';
But somehow it works fine for most of the pages, but for a few it does not work at all. An example would be the following address: Why cannot I get the source code? How can I modify my perl code to get the source? Is it something with changing ports? Thanks in advance, Wojciech.

Replies are listed 'Best First'.
Re: problem with getting web source
by BrowserUk (Pope) on Mar 24, 2013 at 17:32 UTC

    They are rejecting you ( ERROR 403: Forbidden.) because of the user agent string; use one that resembles a real browser.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: problem with getting web source
by Corion (Pope) on Mar 24, 2013 at 17:04 UTC

    How does it fail for you?

    Does the URL work from the browser?

    Have you tried error diagnosis by switching from LWP::Simple to LWP::UserAgent and looking at the error returned?

      Yes, it works fine from the browser. You can check it yourself. Just click the gutenberg link. The failure means instead of the source page I am getting an empty string.

        Just because it works for me, from my browser, does not mean that it will work for you, from your browser.

        I suggest you now use LWP::UserAgent to do the URL downloading. That way, you get more detailed error information instead of just a pass/fail status.