Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Scrappy user_agent error

by docster (Novice)
on Jan 06, 2012 at 16:58 UTC ( #946627=note: print w/replies, xml ) Need Help??

in reply to Scrappy user_agent error

I am not trying to do anything malicious or hammer sites. All I really wanted to do was download the Alabama City list from wikipedia, once, and parse it correctly :o)

I decided to do it as a learning experience in Perl web scraping. But if you connect to wikipedia with Web::Scrape it refuses a connection with "bad host name" or "invalid user agent" ect... Scrappy was supposed to let you tweak the user_agent, which is why I chose that package but so far no one really knows how... I could have easily copied and pasted the information long before now. But that is not as challenging and time consuming, or fun. I enjoy solving challenges with Perl. It is truly the work horse of the Internet.

Thanks for all the tips. I may look into some of the other examples posted here. Scrappy looks promising but I think I need to work with an established method rather than an emerging one at this point. :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946627]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (13)
As of 2016-10-24 13:48 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (306 votes). Check out past polls.