Thank you all, as always, for you valuable input and ideas! Ye monks are a smart bunch.
As much as I'd love to help debug W::M::Chrome, I have a short deadline so I decided to use LanX's idea to use xpath to get the table node and the HTML content and then parse that in Perl land. I decided to use HTML::Tree which is simple and tried.
For anyone having a similar issue, here is the code I wrote for this (assuming it has thead, th, and tbody, YMMV):
my @nodes = $mech->xpath('//table');
my @data = parse_table($nodes[0]);
sub parse_table ($table_node){
my $root = HTML::TreeBuilder->new_from_content($table_node->get_at
+tribute('outerHTML'));
my @tparts = $root->find_by_tag_name('table')->content_list;
my @colnames = ( );
my @data;
foreach my $tpart (@tparts){
if($tpart->tag eq 'thead'){
my @rows = $tpart->content_list;
foreach my $row (@rows) {
if($row->tag eq 'tr'){
my @cells = $row->content_list;
# assumes no TH is empty (see below safeguard for
+data cells)
foreach (@cells) {
push @colnames, $_->content->[0];
}
}
}
}
elsif($tpart->tag eq 'tbody'){
my @rows = $tpart->content_list;
foreach my $row (@rows) {
my %row_data = ();
if($row->tag eq 'tr'){
my @cells = $row->content_list;
foreach (0..$#cells) {
# HTML::Element's content method weirdness
if($cells[$cell]->content && scalar(@{$cells[$
+cell]->content})){
$row_data{ $colnames[$cell] } = $cells[$ce
+ll]->content->[0];
}
else{
$row_data{ $colnames[$cell] } = '';
}
}
}
push @data, \%row_data;
}
}
}
return \@data;
}
Thanks again y'all !
--
Alex
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
|
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|