Kanishka has asked for the wisdom of the Perl Monks concerning the following question:

I use this code to capture a page from the web.
use LWP::Simple; $reponse = get("http://www.cse.lk/marketinfo/splsum.jsp"); open(INFILE, ">SCE.htm") or die "Can't open out File: $!\n"; print INFILE $reponse; close INFILE;
I want to get each element of data in a row into seperate variables so i can store them in a datebase. I have tried several times but the HTML seems to be irregular and i'm a novice. :)

Replies are listed 'Best First'.
Re: table capture
by bobf (Monsignor) on Apr 18, 2005 at 05:03 UTC

    Parsing HTML can be tricky, which is why you should use a module to do it whenever possible. There are several HTML parsers on CPAN, and some are specifically designed to parse tables. For example, check out HTML::TableContentParser.

    In addition, since you seem to be downloading stock quotes, you might want to check out some of the modules in the Finance::Quote namespace. You might be able to use one of those to get the data and skip parsing the table completely.

    If you need more help, feel free to ask. Using modules will definitely make it easier.


Re: table capture
by zentara (Archbishop) on Apr 18, 2005 at 12:29 UTC
    Here is an example using HTML::TableExtract. Look at the printout, and you should be able to regex it (or deduce the array structure) and assign it to variables.
    #!/usr/bin/perl use HTML::TableExtract; use LWP::Simple; use Data::Dumper; my $te = new HTML::TableExtract(gridmap=>1); my $content = get("http://www.cse.lk/marketinfo/splsum.jsp"); $te->parse($content); foreach $ts ($te->table_states) { foreach $row ($ts->rows) { #print Dumper $row; print @{$row},"\n"; } }

    I'm not really a human, but I play one on earth. flash japh
Re: table capture
by Ben Win Lue (Friar) on Apr 18, 2005 at 11:42 UTC
    This is not an answer to your question, but it would be easier to read your code, if you had named your filehandle OUTFILE