Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

How do I extract URLs?

by faq_monk (Initiate)
on Oct 08, 1999 at 00:32 UTC ( #759=perlfaq nodetype: print w/replies, xml ) Need Help??

Current Perl documentation can be found at perldoc.perl.org.

Here is our local, out-dated (pre-5.6) version:

A quick but imperfect approach is

    #!/usr/bin/perl -n00
    # qxurl - tchrist@perl.com
    print "$2\n" while m{
        < \s*
          A \s+ HREF \s* = \s* (["']) (.*?) \1
        \s* >
    }gsix;

This version does not adjust relative URLs, understand alternate bases, deal with HTML comments, deal with HREF and NAME attributes in the same tag, or accept URLs themselves as arguments. It also runs about 100x faster than a more ``complete'' solution using the LWP suite of modules, such as the http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.

Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
[erix]: hm nice might do a spot of diving - Red Sea is said to be beautiful
[LanX]: well too many terrorist tourists
[erix]: I guess I can pass for a native
[LanX]: talking about destruction of diving spots
[LanX]: Nodes to consider
[erix]: oops - got to run, see you later
[robby_dobby]: erix: 'appy day'ving

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (11)
As of 2017-04-24 16:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I'm a fool:











    Results (442 votes). Check out past polls.