Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: HTML::TableExtract issues

by poj (Priest)
on Aug 24, 2013 at 16:17 UTC ( #1050821=note: print w/ replies, xml ) Need Help??


in reply to Re^2: HTML::TableExtract issues
in thread HTML::TableExtract issues

"What I would like to do is populate the first cell in each row with the 58035.png (in this case)."

One way is by using the tree mode and look_down like this.

#!perl use strict; use warnings; use HTML::TableExtract 'tree'; use Text::CSV; use LWP::Simple; # input my $html = get('your url'); my $te = HTML::TableExtract->new(); $te->parse($html); # output my $csvfile = 'results.csv'; my $csv = Text::CSV->new ( { binary => 1, eol => "\n" } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, '>:encoding(utf-8)', $csvfile or die "$csvfile : $!"; # process my $count=0; printf "%3s %4s %4s\n",'Tbl','Rows','Cols'; foreach my $ts ($te->tables){ my $tree = $ts->tree(); printf "%3d %4d %4d\n",++$count,$tree->maxrow,$tree->maxcol; foreach my $r (0..$tree->maxrow){ my @cells=(); # is col 1 an img ? my $x = $tree->cell($r,0)->look_down('src',qr/png$/); push @cells,(defined $x) ? $x->attr('src') : $tree->cell($r,0)->as +_text; for my $c (1..$tree->maxcol){ my $val = $tree->cell($r,$c)->as_text; push @cells,$val; } $csv->print ($fh, \@cells); } } close $fh or die "$csvfile: $!";

Notice I have used Text::CSV rather than just adding commas between columns.

poj


Comment on Re^3: HTML::TableExtract issues
Select or Download Code
Replies are listed 'Best First'.
Re^4: HTML::TableExtract issues
by Mr Bigglesworth (Initiate) on Aug 26, 2013 at 12:50 UTC

    Hi poj

    Thank you for your efforts, it is appreciated. It works great.

    I need to change it a little so that I can process the text inside the resulting CSV a little, but you have push me ahead quite a bit.

    I did notice the use of Text::CSV, that is definitely a much better way to do it, certainly a much more reliable way

    Cheers

    Mr Bigglesworth

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1050821]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (17)
As of 2015-07-28 19:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (258 votes), past polls