eversuhoshin has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
I need help web crawling. I need to obtain the html code in the web page itself. I have tried WWW::Mechanize and URI to convert it to an absolute URL. But I have failed so far.
Can someone please help me crawl through or download the html code of the webpage of
www.sec.gov/Archives/edgar/data/935226/000114420411058092/0001144204-11-058092-index.htm
Here is the code trying to crawl the edgar website
use strict; use WWW::Mechanize; use LWP::Simple; use URI; my $url='edgar/data/1750/0001104659-06-059326-index.html'; my $web='www.sec.gov/Archives/'.$url; my @temp=split(/\//,$url); chomp($web); my $rel_url='/'.$temp[2].'/'.$temp[3]; my $base_url='www.sec.gov/Archives/edgar/data'; my $abs_url=URI->new_abs($rel_url,$base_url); my $text=get($abs_url) or die $!;
This is the SEC Edgar data base and once I figure out how to crawl through I can do the parsing. I just need the information between the "div class="infoHead"Items div" Thank you so much!
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Help with web crawling
by CountZero (Bishop) on Dec 09, 2012 at 09:27 UTC | |
by eversuhoshin (Sexton) on Dec 09, 2012 at 16:28 UTC | |
Re: Help with web crawling
by tobyink (Canon) on Dec 09, 2012 at 11:01 UTC | |
Re: Help with web crawling
by space_monk (Chaplain) on Dec 09, 2012 at 07:55 UTC |
Back to
Seekers of Perl Wisdom