go ahead... be a heretic | |
PerlMonks |
Re: can't extract node with HTML::TreeBuilder::XPathby tobyink (Canon) |
on Jul 29, 2012 at 19:54 UTC ( [id://984312]=note: print w/replies, xml ) | Need Help?? |
Per spec, the xpath is wrong. Given the following HTML:
The correct xpath to select the table cell is along the lines of //table/tbody/tr/td. Yes, there's an invisible <tbody> element in there! A standards-compliant HTML parser will always insert the <tbody> tag for you if you miss it out. The /a/h3 part of the xpath is an interesting feature too. In HTML 4.x, <h3> is not a permitted child of <a>. What exactly to do when encountering such an element is undefined. Some parsers may close the <a> element early so that the <h3> ends up as a sibling of it rather than the child of it. But under HTML 5 rules, <h3> is a permitted child of <a>. What does HTML::TreeBuilder do? Who knows!? HTML::TreeBuilder's documentation is pretty vague. This is exactly the sort of reasons I maintain HTML::HTML5::Parser which is a fork of a third-party non-CPAN HTML5 parser, ported to run on top of XML::LibXML. FWIW, this works for me...
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
In Section
Seekers of Perl Wisdom
|
|