There's more than one way to do things | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
How interesting – sounds like prescient advice to me, if the HTML actually qualifies as XHTML. Very obviously, the OP should be using any one of the several "HTML Parsers" that are readily available here, in order to be handed the particular strings that need to be further processed – as it were, "on a silver platter." Regular expressions applied against the HTML string would be a monumental waste of effort that wouldn't produce results nearly so good.
But, "XPath is even better, if it works," because this strategy is non-procedural. If it works, then it means that you do not have to write programming that is tied to the structure of the parent document ... and which therefore would no longer work What you really would like to avoid – and what XPath is very much engineered to let you avoid – is programming that is specific to the exact structure of the XML. Such logic is not only fragile ... but unable to realize that it is now producing incomplete answers. (Having said that: "XPath, also, is not a panacea.") In reply to Re^3: Parsing a large html with perl
by sundialsvc4
|
|