Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: HTTP Scripting

by ajt (Prior)
on Nov 29, 2002 at 12:30 UTC ( #216514=note: print w/ replies, xml ) Need Help??


in reply to HTTP Scripting

marinersk,

Perl is good at this kind of thing. Perl has modules to connect to web servers (LWP), work with the cookies and passwords, and parse HTML (HTML).

Perl has several HTML/XML parsers, some are general purpose parsers, and some are dedicated, e.g. link extractors, header parsers.

You could argue that your choice is so wide that it becomes daunting!

I would suggest the following books: Perl and LWP which is all about connecting to, collecting from, and parsing of web data. I would also suggest Data Munging with Perl, it's a little older and more generic (it's for more than just web automation), but it's a fine book and has good examples of web data mining. Web Client Programming with Perl is old and out of print, but it's freely available as an OpenBook from O'Reilly, and quite useful.

I would also check out merlyn's columns as I think there are some good examples in there with good descriptions. There may also be something in Perl.com's article archive.


--
ajt


Comment on Re: HTTP Scripting
Re: Re: HTTP Scripting
by marinersk (Chaplain) on Nov 29, 2002 at 15:24 UTC
    Thanks, ajt, excellent book references, and things which will gladly join my growing library.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://216514]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2014-07-23 00:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (130 votes), past polls