Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: REGEX for url

by tangent (Vicar)
on Apr 25, 2016 at 22:15 UTC ( #1161489=note: print w/replies, xml ) Need Help??


in reply to REGEX for url

Others have suggested HTML::LinkExtor. Here is a way to do it using HTML::TreeBuilder::XPath. Very handy if you need to extract other information from the file.
use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_file("/path/to/file.html"); $tree->eof; my @links = $tree->findnodes('//a') ; for my $link ( @links ){ print $link->attr('href'), "\n"; }
That will print every link. If you only want the links from the table then:
my @links = $tree->findnodes('//td/a') ; for my $link ( @links ){ print $link->attr('href'), "\n"; }
Output:
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +001.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +002.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +003.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +004.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +005.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +006.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +007.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +008.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +009.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +010.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365.t +xt

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1161489]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2018-07-19 21:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (420 votes). Check out past polls.

    Notices?