Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

How well will your solution work when the layout of the page changes subtly? I'm not pretending that the solution I've given here is bullet proof, but it's a lot more flexible than yours is.

The point is that HTML parsers understand HTML. It's easier to write a solution when you use the right tool for the job. If you look at my solution, the code is very easy to follow - find all the table rows in the HTML, then find one where the text starts with the required IP address, then extract all of the text from that row. I didn't need to go into the detail of the HTML in the same way that you did.

Yes, it's possible to extract useful data from HTML using regular expressions (the most excellent book Perl & LWP is full of them) but that can only ever be a "use once", quick and dirty hack.

Oh, and a final comment on your terminology. What we're all doing in this problem is parsing. Data extraction is parsing by any meaningful definition of the term.

--
<http://www.dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg


In reply to Re: Being a heretic and going against the party line. by davorg
in thread Being a heretic and going against the party line. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others surveying the Monastery: (5)
    As of 2014-07-12 20:02 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      When choosing user names for websites, I prefer to use:








      Results (241 votes), past polls