Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Scrappy user_agent error

by docster (Novice)
on Jan 06, 2012 at 16:58 UTC ( #946627=note: print w/ replies, xml ) Need Help??


in reply to Scrappy user_agent error

I am not trying to do anything malicious or hammer sites. All I really wanted to do was download the Alabama City list from wikipedia, once, and parse it correctly :o)

I decided to do it as a learning experience in Perl web scraping. But if you connect to wikipedia with Web::Scrape it refuses a connection with "bad host name" or "invalid user agent" ect... Scrappy was supposed to let you tweak the user_agent, which is why I chose that package but so far no one really knows how... I could have easily copied and pasted the information long before now. But that is not as challenging and time consuming, or fun. I enjoy solving challenges with Perl. It is truly the work horse of the Internet.

Thanks for all the tips. I may look into some of the other examples posted here. Scrappy looks promising but I think I need to work with an established method rather than an emerging one at this point. :)


Comment on Re: Scrappy user_agent error

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946627]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (10)
As of 2014-10-02 13:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (61 votes), past polls