Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Scrappy user_agent error

by docster (Novice)
on Jan 06, 2012 at 16:58 UTC ( #946627=note: print w/ replies, xml ) Need Help??

in reply to Scrappy user_agent error

I am not trying to do anything malicious or hammer sites. All I really wanted to do was download the Alabama City list from wikipedia, once, and parse it correctly :o)

I decided to do it as a learning experience in Perl web scraping. But if you connect to wikipedia with Web::Scrape it refuses a connection with "bad host name" or "invalid user agent" ect... Scrappy was supposed to let you tweak the user_agent, which is why I chose that package but so far no one really knows how... I could have easily copied and pasted the information long before now. But that is not as challenging and time consuming, or fun. I enjoy solving challenges with Perl. It is truly the work horse of the Internet.

Thanks for all the tips. I may look into some of the other examples posted here. Scrappy looks promising but I think I need to work with an established method rather than an emerging one at this point. :)

Comment on Re: Scrappy user_agent error

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946627]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2016-02-10 09:36 GMT
Find Nodes?
    Voting Booth?

    How many photographs, souvenirs, artworks, trophies or other decorative objects are displayed in your home?

    Results (341 votes), past polls