Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

table capture

by Kanishka (Beadle)
on Apr 18, 2005 at 04:35 UTC ( [id://448731]=perlquestion: print w/replies, xml ) Need Help??

Kanishka has asked for the wisdom of the Perl Monks concerning the following question:

I use this code to capture a page from the web.
use LWP::Simple; $reponse = get("http://www.cse.lk/marketinfo/splsum.jsp"); open(INFILE, ">SCE.htm") or die "Can't open out File: $!\n"; print INFILE $reponse; close INFILE;
I want to get each element of data in a row into seperate variables so i can store them in a datebase. I have tried several times but the HTML seems to be irregular and i'm a novice. :)

Replies are listed 'Best First'.
Re: table capture
by bobf (Monsignor) on Apr 18, 2005 at 05:03 UTC

    Parsing HTML can be tricky, which is why you should use a module to do it whenever possible. There are several HTML parsers on CPAN, and some are specifically designed to parse tables. For example, check out HTML::TableContentParser.

    In addition, since you seem to be downloading stock quotes, you might want to check out some of the modules in the Finance::Quote namespace. You might be able to use one of those to get the data and skip parsing the table completely.

    If you need more help, feel free to ask. Using modules will definitely make it easier.

    HTH

Re: table capture
by zentara (Archbishop) on Apr 18, 2005 at 12:29 UTC
    Here is an example using HTML::TableExtract. Look at the printout, and you should be able to regex it (or deduce the array structure) and assign it to variables.
    #!/usr/bin/perl use HTML::TableExtract; use LWP::Simple; use Data::Dumper; my $te = new HTML::TableExtract(gridmap=>1); my $content = get("http://www.cse.lk/marketinfo/splsum.jsp"); $te->parse($content); foreach $ts ($te->table_states) { foreach $row ($ts->rows) { #print Dumper $row; print @{$row},"\n"; } }

    I'm not really a human, but I play one on earth. flash japh
Re: table capture
by Ben Win Lue (Friar) on Apr 18, 2005 at 11:42 UTC
    This is not an answer to your question, but it would be easier to read your code, if you had named your filehandle OUTFILE

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://448731]
Approved by thor
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-03-19 04:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found