Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: REGEX for url

by tangent (Vicar)
on Apr 25, 2016 at 22:15 UTC ( #1161489=note: print w/replies, xml ) Need Help??


in reply to REGEX for url

Others have suggested HTML::LinkExtor. Here is a way to do it using HTML::TreeBuilder::XPath. Very handy if you need to extract other information from the file.
use HTML::TreeBuilder::XPath; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_file("/path/to/file.html"); $tree->eof; my @links = $tree->findnodes('//a') ; for my $link ( @links ){ print $link->attr('href'), "\n"; }
That will print every link. If you only want the links from the table then:
my @links = $tree->findnodes('//td/a') ; for my $link ( @links ){ print $link->attr('href'), "\n"; }
Output:
/Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +001.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +002.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +003.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +004.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +005.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +006.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +007.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +008.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +009.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365-0 +010.txt /Archives/edgar/data/1050122/000092735601000365/0000927356-01-000365.t +xt

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1161489]
help
Chatterbox?
[Corion]: choroba: Yes, but you also explicitly grant them access to your event, which I find extremely far reaching
[Corion]: Anyway, so much for outsourcing that part of perlmongers event organization :) We'll have to fix Act instead :)
[choroba]: Is it broken?
[Corion]: choroba: There are lots of things that could be better, like the payment flow, and also many (missing) exports for organizers. You can't undo a payment, or remove somebody from a conference easily
[Corion]: Placing the talks is harder than it should be, etc. - lots of small(ish) things
[Corion]: But I'm against a rewrite as it does what it does well, and allows (Perl) events to happen, which a rewrite wouldn't ;)
[Corion]: But I must be leaving now, see you tomorrow ;)
[choroba]: see you :)

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2018-04-22 19:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?