Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Being a heretic and going against the party line.

by davorg (Chancellor)
on Oct 03, 2002 at 13:32 UTC ( #202511=note: print w/replies, xml ) Need Help??

in reply to Being a heretic and going against the party line.

How well will your solution work when the layout of the page changes subtly? I'm not pretending that the solution I've given here is bullet proof, but it's a lot more flexible than yours is.

The point is that HTML parsers understand HTML. It's easier to write a solution when you use the right tool for the job. If you look at my solution, the code is very easy to follow - find all the table rows in the HTML, then find one where the text starts with the required IP address, then extract all of the text from that row. I didn't need to go into the detail of the HTML in the same way that you did.

Yes, it's possible to extract useful data from HTML using regular expressions (the most excellent book Perl & LWP is full of them) but that can only ever be a "use once", quick and dirty hack.

Oh, and a final comment on your terminology. What we're all doing in this problem is parsing. Data extraction is parsing by any meaningful definition of the term.


"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

  • Comment on Re: Being a heretic and going against the party line.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://202511]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2020-11-27 13:48 GMT
Find Nodes?
    Voting Booth?

    No recent polls found