Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

How do I extract URLs?

by faq_monk (Initiate)
on Oct 08, 1999 at 00:32 UTC ( #759=perlfaq nodetype: print w/replies, xml ) Need Help??

Current Perl documentation can be found at

Here is our local, out-dated (pre-5.6) version:

A quick but imperfect approach is

    #!/usr/bin/perl -n00
    # qxurl -
    print "$2\n" while m{
        < \s*
          A \s+ HREF \s* = \s* (["']) (.*?) \1
        \s* >

This version does not adjust relative URLs, understand alternate bases, deal with HTML comments, deal with HREF and NAME attributes in the same tag, or accept URLs themselves as arguments. It also runs about 100x faster than a more ``complete'' solution using the LWP suite of modules, such as the program.

Log In?

What's my password?
Create A New User
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (14)
As of 2016-10-25 14:15 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (320 votes). Check out past polls.