Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Hi taioba,

You have run afoul of the Robots Exclusion Protocol. Many websites prefer that real humans with real eyeballs to visit their site. Some feel strongly enough to ban software "robots" such as LWP::UserAgent. Sciencedirect.com is one of these. If you look at the robots.txt file for sciencedirect.com, you'll see they only let the big boys (Google, et. al.) spider their site. All others (including you) can go suck rocks. There is no (legit) solution to this problem except to call the webmasters and convince them that it is in their interest to allow your program to crawl their site. Good luck with that. Alternatively, see if the site has an RSS data feed or API that provides the data you need. APIs especially are less subject to interdiction by webmasters, since they are designed for program-to-program integration.

Cheers,

Larry


In reply to Re: LWP::UserAgent Bad and Forbidden requests by 1arryb
in thread LWP::UserAgent Bad and Forbidden requests by taioba

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others scrutinizing the Monastery: (10)
    As of 2018-07-16 06:54 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















      Results (333 votes). Check out past polls.

      Notices?