The problem is correctly identified in Re^3: Unexpected results from a regex replacement (++). You are running the regexp on the $outdata every time you add a line to it.
The reason the above regexp works is it doesn't look for "<cfmail", it looks for "<cfmail" that isn't preceded by a "<!--" comment tag. Consider the following:
my $outdata_v1 = "";
my $outdata_v2 = "";
my $data_offset = tell DATA;
my $line_count = 1;
print "First Regexp solution\n";
print "-"x20, "\n";
while ( <DATA> )
{
$outdata_v1 .= $_;
print "outdata for read of line $line_count before:\n$outdata_v1\n";
$outdata_v1 =~ s{<cfmail}{<!--- <cfmail}g;
$outdata_v1 =~ s{</cfmail>}{</cfmail> --->}g;
print "outdata for read of line $line_count after:\n$outdata_v1\n";
$line_count++;
}
#-- reset it all, start again with the better regexp.
seek( DATA, $data_offset, 0);
$line_count = 1;
print "Second Regexp solution\n";
print "-"x20, "\n";
while ( <DATA> ){
$outdata_v2 .= $_;
print "outdata for read of line $line_count before:\n$outdata_v2\n";
$outdata_v2 =~ s{(?<!<!--- )<cfmail}{<!--- <cfmail}g;
$outdata_v2 =~ s{</cfmail>(?! --->)}{</cfmail> --->}g;
print "outdata for read of line $line_count after:\n$outdata_v2\n";
$line_count++;
}
__DATA__
<cfmail to="#to_address#">
</cfmail>
<cfmail to="#to_address_2#">
The output is:
First Regexp solution
--------------------
outdata for read of line 1 before:
<cfmail to="#to_address#">
outdata for read of line 1 after:
<!--- <cfmail to="#to_address#">
outdata for read of line 2 before:
<!--- <cfmail to="#to_address#">
</cfmail>
outdata for read of line 2 after:
<!--- <!--- <cfmail to="#to_address#">
</cfmail> --->
outdata for read of line 3 before:
<!--- <!--- <cfmail to="#to_address#">
</cfmail> --->
<cfmail to="#to_address_2#">
outdata for read of line 3 after:
<!--- <!--- <!--- <cfmail to="#to_address#">
</cfmail> ---> --->
<!--- <cfmail to="#to_address_2#">
Second Regexp solution
--------------------
outdata for read of line 1 before:
<cfmail to="#to_address#">
outdata for read of line 1 after:
<!--- <cfmail to="#to_address#">
outdata for read of line 2 before:
<!--- <cfmail to="#to_address#">
</cfmail>
outdata for read of line 2 after:
<!--- <cfmail to="#to_address#">
</cfmail> --->
outdata for read of line 3 before:
<!--- <cfmail to="#to_address#">
</cfmail> --->
<cfmail to="#to_address_2#">
outdata for read of line 3 after:
<!--- <cfmail to="#to_address#">
</cfmail> --->
<!--- <cfmail to="#to_address_2#">
You can see that your original regexp (as Eimi Metamorphoumai correctly pointed out), runs on every line in your file for each line in the file, adding a new comment flag every time. The second regexp solution does not add a new comment every time, since it is constructed to look for cfmail flags that are not preceded by a comment.
|