http://www.perlmonks.org?node_id=137462

oaklander has asked for the wisdom of the Perl Monks concerning the following question:

This script finds and extracts image links in an HTML file but doesnt handle link text with embedded HTML. For example if there is something with font size it wont pick it up...
<A HREF="path/name"><FONT SIZE=-1>path name</FONT></A>
but will pick it up without the embedded html...
<A HREF="path/name">path name</A>
Here is the Perl script:
$/ = ""; $raw = ""; $linktext = ""; %atts = (); while (<>) { while (/<A\s([^>]+)>([^<]+)<\/A>/ig) { $raw = $1; $linktext = $2; $linktext =~ s/[\s]*\n/ /g; while ($raw =~ /([^\s=]+)\s*=\s*("([^"]+)"|[^\s]+\s*)/ig) { if (defined $3) { $atts{ uc($1) } = $3; } else { $atts{ uc($1) } = $2; } print '-' x 15; print "\nLink text: $linktext\n"; foreach $key ("HREF", "NAME", "TITLE", "REL", "REV", "TARGET") { if (exists($atts{$key})) { $atts{$key} =~ s/[\s]*\n/ /g; print " $key: $atts{$key}\n"; } } %atts = (); } } }