Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Unexpected results from a regex replacement

by bgreenlee (Friar)
on Nov 10, 2004 at 22:08 UTC ( [id://406817]=note: print w/replies, xml ) Need Help??


in reply to Unexpected results from a regex replacement

Because your regex just looks for <cfmail, which still occurs in the commented-out version. My guess is that it is repeated because you've run it multiple times. Try this (untested):

$outdata =~ s{(?<!<!--- )<cfmail}{<!--- <cfmail}g $outdata =~ s{</cfmail>(?! --->)}{</cfmail> --->}g;

That will only replace cfmail tags that aren't preceeded by a comment delimiter, and closing cfmail tags that aren't followed by a comment delimiter.

-b

Replies are listed 'Best First'.
Re^2: Unexpected results from a regex replacement
by yacoubean (Scribe) on Nov 10, 2004 at 22:21 UTC
    Good thought, but that is not the case. I am deleting the comment tags after every run.

    Here is the original code copied/pasted directly from one of my pages:
    <cfmail to="#to_address#" .... </cfmail>
    Now, I just ran the script again and got (again, copied directly from the page after the script ran):
    <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!-- +- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- < +!--- <!--- <!--- <!--- <cfmail to="#to_address#" ..... </cfmail> ---> ---> ---> ---> ---> ---> ---> --->
Re^2: Unexpected results from a regex replacement
by yacoubean (Scribe) on Nov 10, 2004 at 22:28 UTC
    Ok, that's odd. Even though bgreenlee was wrong in saying that I wasn't clearing out the comment tags from previous runs, his code still did the trick. It doesn't make sense to me why
    $outdata =~ s{(?<!<!--- )<cfmail}{<!--- <cfmail}g; $outdata =~ s{</cfmail>(?! --->)}{</cfmail> --->}g;
    works better than
    $outdata =~ s{<cfmail}{<!--- <cfmail}g; $outdata =~ s{</cfmail>}{</cfmail> --->}g;
    but it does, so I'm not going to complain. :)
      The problem is correctly identified in Re^3: Unexpected results from a regex replacement (++). You are running the regexp on the $outdata every time you add a line to it.

      The reason the above regexp works is it doesn't look for "<cfmail", it looks for "<cfmail" that isn't preceded by a "<!--" comment tag. Consider the following:

      my $outdata_v1 = ""; my $outdata_v2 = ""; my $data_offset = tell DATA; my $line_count = 1; print "First Regexp solution\n"; print "-"x20, "\n"; while ( <DATA> ) { $outdata_v1 .= $_; print "outdata for read of line $line_count before:\n$outdata_v1\n"; $outdata_v1 =~ s{<cfmail}{<!--- <cfmail}g; $outdata_v1 =~ s{</cfmail>}{</cfmail> --->}g; print "outdata for read of line $line_count after:\n$outdata_v1\n"; $line_count++; } #-- reset it all, start again with the better regexp. seek( DATA, $data_offset, 0); $line_count = 1; print "Second Regexp solution\n"; print "-"x20, "\n"; while ( <DATA> ){ $outdata_v2 .= $_; print "outdata for read of line $line_count before:\n$outdata_v2\n"; $outdata_v2 =~ s{(?<!<!--- )<cfmail}{<!--- <cfmail}g; $outdata_v2 =~ s{</cfmail>(?! --->)}{</cfmail> --->}g; print "outdata for read of line $line_count after:\n$outdata_v2\n"; $line_count++; } __DATA__ <cfmail to="#to_address#"> </cfmail> <cfmail to="#to_address_2#">
      The output is:
      First Regexp solution -------------------- outdata for read of line 1 before: <cfmail to="#to_address#"> outdata for read of line 1 after: <!--- <cfmail to="#to_address#"> outdata for read of line 2 before: <!--- <cfmail to="#to_address#"> </cfmail> outdata for read of line 2 after: <!--- <!--- <cfmail to="#to_address#"> </cfmail> ---> outdata for read of line 3 before: <!--- <!--- <cfmail to="#to_address#"> </cfmail> ---> <cfmail to="#to_address_2#"> outdata for read of line 3 after: <!--- <!--- <!--- <cfmail to="#to_address#"> </cfmail> ---> ---> <!--- <cfmail to="#to_address_2#"> Second Regexp solution -------------------- outdata for read of line 1 before: <cfmail to="#to_address#"> outdata for read of line 1 after: <!--- <cfmail to="#to_address#"> outdata for read of line 2 before: <!--- <cfmail to="#to_address#"> </cfmail> outdata for read of line 2 after: <!--- <cfmail to="#to_address#"> </cfmail> ---> outdata for read of line 3 before: <!--- <cfmail to="#to_address#"> </cfmail> ---> <cfmail to="#to_address_2#"> outdata for read of line 3 after: <!--- <cfmail to="#to_address#"> </cfmail> ---> <!--- <cfmail to="#to_address_2#">
      You can see that your original regexp (as Eimi Metamorphoumai correctly pointed out), runs on every line in your file for each line in the file, adding a new comment flag every time. The second regexp solution does not add a new comment every time, since it is constructed to look for cfmail flags that are not preceded by a comment.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://406817]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-25 14:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found