Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Scrappy user_agent error

by docster (Novice)
on Jan 06, 2012 at 16:58 UTC ( #946627=note: print w/ replies, xml ) Need Help??


in reply to Scrappy user_agent error

I am not trying to do anything malicious or hammer sites. All I really wanted to do was download the Alabama City list from wikipedia, once, and parse it correctly :o)

I decided to do it as a learning experience in Perl web scraping. But if you connect to wikipedia with Web::Scrape it refuses a connection with "bad host name" or "invalid user agent" ect... Scrappy was supposed to let you tweak the user_agent, which is why I chose that package but so far no one really knows how... I could have easily copied and pasted the information long before now. But that is not as challenging and time consuming, or fun. I enjoy solving challenges with Perl. It is truly the work horse of the Internet.

Thanks for all the tips. I may look into some of the other examples posted here. Scrappy looks promising but I think I need to work with an established method rather than an emerging one at this point. :)


Comment on Re: Scrappy user_agent error

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946627]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2015-07-03 08:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (49 votes), past polls