Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: REGEX for url

by tangent (Priest)
on Apr 25, 2016 at 22:15 UTC ( #1161489=note: print w/replies, xml ) Need Help??


in reply to REGEX for url

Others have suggested HTML::LinkExtor. Here is a way to do it using HTML::TreeBuilder::XPath. Very handy if you need to extract other information from the file.
use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_file("/path/to/file.html"); $tree->eof; my @links = $tree->findnodes('//a') ; for my $link ( @links ){ print $link->attr('href'), "\n"; }
That will print every link. If you only want the links from the table then:
my @links = $tree->findnodes('//td/a') ; for my $link ( @links ){ print $link->attr('href'), "\n"; }
Output:
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +001.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +002.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +003.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +004.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +005.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +006.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +007.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +008.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +009.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +010.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365.t +xt

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1161489]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2017-12-15 09:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (425 votes). Check out past polls.

    Notices?