This script finds and extracts image links in an HTML file but
doesnt handle link text with embedded HTML. For example
if there is something with font size it wont pick it up...
<A HREF="path/name"><FONT SIZE=-1>path name</FONT></A>
but will pick it up without the embedded html...
<A HREF="path/name">path name</A>
Here is the Perl script:
$/ = "";
$raw = "";
$linktext = "";
%atts = ();
while (<>)
{
while (/<A\s([^>]+)>([^<]+)<\/A>/ig)
{
$raw = $1;
$linktext = $2;
$linktext =~ s/[\s]*\n/ /g;
while ($raw =~ /([^\s=]+)\s*=\s*("([^"]+)"|[^\s]+\s*)/ig)
{
if (defined $3)
{
$atts{ uc($1) } = $3;
}
else
{
$atts{ uc($1) } = $2;
}
print '-' x 15;
print "\nLink text: $linktext\n";
foreach $key ("HREF", "NAME", "TITLE", "REL", "REV", "TARGET")
{
if (exists($atts{$key}))
{
$atts{$key} =~ s/[\s]*\n/ /g;
print " $key: $atts{$key}\n";
}
}
%atts = ();
}
}
}