Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

The LWP module is quite possibly one of the most popular modules. Most perl programmers have used it to perform everything from fetching the 'headlines' from news related web site to writing a 'link checker' or one-line browser.

O'Reilly's new book Perl & LWP is a thin but extremely useful guide for the beginning to intermediate perl programmer who has an interest in mining the web for specific information. At only 196 pages ( approx. ), the information it contains is quite powerful:

  • 1. Introduction to Web Automation
    • The Web as Data Source
    • History of LWP
    • Installing LWP
    • Words of Caution
    • LWP in Action
  • 2. Web Basics
    • URLs
    • An HTTP Transaction
    • LWP::Simple
    • Fetching Documents Without LWP::Simple
    • Example: AltaVista
    • Example: Babelfish
  • 3. The LWP Class Model
    • The Basic Classes
    • Programming with LWP Classes
    • Inside the do_GET and do_POST Functions
    • User Agents
    • HTTP::Response Objects
    • LWP Classes: Behind the Scenes
    • 4. URLs
    • Parsing URLs
    • Relative URLs
    • Converting Absolute URLs to Relative
    • Converting Relative URLs to Absolute
  • 5. Forms
    • Elements of an HTML Form
    • LWP and GET Requests
    • Automating Form Analysis
    • Idiosyncrasies of HTML Forms
    • POST Example: License Plates
    • POST Example:
    • File Uploads
    • Limits on Forms
  • 6. Simple HTML Processing with Regular Expressions
    • Automating Data Extraction
    • Regular Expression Techniques
    • Troubleshooting
    • When Regular Expressions Aren't Enough
    • Example: Extracting Links from a Bookmark File
    • Example: Extracting Links from Arbitrary HTML
    • Example: Extracting Temperatures from Weather Underground
  • 7. HTML Processing with Tokens
    • HTML as Tokens
    • Basic HTML::TokeParser
    • Use Individual Tokens
    • Token Sequences
    • More HTML::TokeParser Methods
    • Using Extracted Text
  • 8. Tokenizing Walkthrough
    • The Problem
    • Getting the Data
    • Inspecting the HTML
    • First Code
    • Narrowing In
    • Rewrite for Features
    • Alternatives
  • 9. HTML Processing with Trees
    • Introduction to Trees
    • HTML::TreeBuilder
    • Processing
    • Example: BBC News
    • Example: Fresh Air
  • 10. Modifying HTML with Trees
    • Changing Attributes
    • Deleting Images
    • Detaching and Reattaching
    • Attaching in Another Tree
    • Creating New Elements
  • 11. Cookies, Authentication, and Advanced Requests
    • Cookies
    • Adding Extra Request Header Lines
    • Authentication
    • An HTTP Authentication Example: The Unicode Mailing Archive
  • 12. Spiders
    • Types of Web-Querying Programs
    • A User Agent for Robots
    • Example: A Link-Checking Spider
    • Ideas for Further Expansion
  • Appendices
    • A. LWP Modules
    • B. HTTP Status Codes
    • C. Common MIME Types
    • D. Language Tags
    • E. Common Content Encodings
    • F. ASCII Table
    • G. User's View of Object-Oriented Modules
  • Index

I would like to have seen a greater focus on examples, some screen shots, and more module usage but this is still a very good book. As far as audience requirements are concerned, the reader should know ( at least ) basic perl concepts but to fully use the book, sub-routines ( regular and anonymous ), references, and a good knowledge of regexes is critical. A fundamental understanding of objects is also important ( although not required ).

Although this is a small book, what you will gain by reading it isn't.


Edited: ~Thu Aug 1 13:51:24 2002 (GMT) by footpad: Added HTML formatting tags

In reply to Perl & LWP by DigitalKitty

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2022-10-03 11:08 GMT
Find Nodes?
    Voting Booth?
    My preferred way to holiday/vacation is:

    Results (13 votes). Check out past polls.