Ovid, this is what I came up with after messing with it for
a while. It handles the two input strings you were having
problems with (quote characters are not optional) but I've
no idea if it will work for all possible data. The quote
characters optional we can talk about tomorrow.
#!/usr/bin/perl -w
use strict;
my ($data, $res);
$data = '<a & href="somesite.html">test<\a>';
print "Before substitution: $data\n";
$res = $data =~ s/
( # Capture to $1 <a and
<a\s # a space character
)
(?: # Non-capturing parens
[^>]* # stuff between a and href
)
(
href\s* # href followed by spaces
)
(
=\s* # Equals followed by spaces
(
["']+ # Open quote character
)
(
[^"']+ # Non open quote character
)
(?:
\4 # Close quote character
)
)
(
> # Not final close angle bracket
)
(
[^>]+ # Up to closing angle bracket
> # Final close angle bracket
)
/$1$2$3$6$7/x;
print "no match\n" if ($res eq "");
print "After substitution: $data\n";