Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
appaling (sic), you say?

Well, the nested tables are awkward and the use of various outdated or deprecated tags is unfortunate; the lack of quotes and the like can certainly be labeled "mistakes." But "appalling" is a pretty strong word. Perhaps "dated" or similar would be better.

...so bad as to be practically of no use.

Even harsher (and IMO, excessive), particularly since what we know about the html fails to support any inference that OP bears any responsibility.

There is, however, a valuable nugget that saves your post from a quick downvote -- the notion that future changes could break a regex solution. OTOH, any solution we can readily offer today would also be broken were the html converted to 100% compliant xml.


In reply to Re^3: how to quickly parse 50000 html documents? (Updated: 50,000 pages in 3 minutes!) by ww
in thread how to quickly parse 50000 html documents? by brengo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others wandering the Monastery: (8)
    As of 2014-12-22 07:09 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (112 votes), past polls