ghosh123 has asked for the wisdom of the Perl Monks concerning the following question:

I am extracting some data from my project webpage and dumping them in a excel sheet. The webpage has got a table on it showing some results and status of few job runs. It uses green to denote a 'pass' and red to denote a 'fail' like below :


I am able to successfully dump the content of the page in a excel sheet but not the colors of the table cells. How can I restore the colors as well when dumping the content in excel?
Below is my code

use Spreadsheet::WriteExcel; use HTML::TableExtract; require LWP::UserAgent; my $ua = new LWP::UserAgent(); my $request = HTTP::Request->new(GET=>"") +; my $response ; $response = $ua->request($request); $response = $response->content(); # Create a new Excel file my $filename = "/home/user/result.xls"; my $workbook = Spreadsheet::WriteExcel->new($filename); # Add a worksheet my $worksheet = $workbook->add_worksheet('exec2'); # Define the format and add it to the worksheet my $format = $workbook->add_format( center_across => 1, bold => 1, size => 10, border => 4, color => 'black', bg_color => 'cyan', border_color => 'black', align => 'vcenter', ); my $te = HTML::TableExtract->new( keep_html => 0); $te->parse($response); foreach my $ts ($te->tables) { my $nrow=0; foreach my $row ($ts->rows) { my $ncol=0; foreach my $col (@$row) { $worksheet->write($nrow,$ncol++,$col) if defined $col; } $nrow++; } $nrow++; }


Replies are listed 'Best First'.
Re: How to restore the colors of a webpage when extracting data
by kcott (Canon) on Jul 16, 2013 at 05:05 UTC

    G'day ghosh123,

    From looking at the documentation, it would appear there's three basic steps you'll need to follow:

    1. Use HTML::TableExtract to get the table cells as HTML::Element objects.
    2. Use HTML::Element to get the attributes of each cell.
    3. Use the optional $format parameter of Spreadsheet::WriteExcel's write($row, $column, $token, $format) method to apply the format you retrieved from the HTML table. (See CELL FORMATTING for details.)

    [Aside: I'd recommend you look at using more meaningful variable names. You appear to be assigning cell data to a variable you've called $col. That variable would suggest column, not cell, data. As such, that's a potential source of confusion and errors either now or in the future.]

    -- Ken

      Hi ken

      Thanks for your reply.

      But I am not quite able to figure out how to use HTML::TableExtract to get the table cells as HTML::Element objects.

      I am giving a very simple html table below, using the code I have already given above, can you please show me how can I use HTML::Element and get the attributes.

      Assume that the page contains the following table :

      <html> <title>Example table </title> <head> MyTable</head><br><br> <body> <table border border ="1"> <tr> <td><FONT COLOR = "green">ID</td> <td> <FONT COLOR = "blue">NAME</td> <td> <FONT COLOR = "RED">DOB</td> </tr> <tr> <td>1</td> <td>XYZ</td> <td>1-1-2000</td> </tr> <tr> <td>2</td> <td>PQR</td> <td>1-11-2000</td> </tr> </table> </body> </html>
        "But I am not quite able to figure out how to use HTML::TableExtract to get the table cells as HTML::Element objects."

        Really? It's mentioned repeatedly thoughout the HTML::TableExtract documentation (a link I provided yesterday). Within that page, search for the string "HTML::Element" and read every section it appears in (which, by a very brief visual inspection, appears to be all of them except AUTHOR and COPYRIGHT).

        -- Ken