Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^3: Substitution don't work

by AnomalousMonk (Archbishop)
on Aug 24, 2015 at 12:53 UTC ( #1139674=note: print w/replies, xml ) Need Help??


in reply to Re^2: Substitution don't work
in thread Substitution don't work

In your TestEK.txt file, you have all  {[%tqu ... ]} sequences on the same line. In the second, real file, these sequences span two or more lines.

You are processing the file line-by-line and matching your regex against each line, so if a  {[%tqu ... ]} sequence spans multiple lines, the regex will never see it.

If the file is small, less than, say, several hundred megabytes and never likely to grow larger, it might be easiest to "slurp" the entire file at once as a string into a scalar variable and then do a single  s/// against the variable, then write the string back out to the new file.
    my $string = do { local $/;  <>; };
    $string =~ s/$regex/$subst/gis;
    print $string;
(untested). Note that the  s/// now needs a  /s regex modifier so that  . (dot) in  .* will match a newline across multiple lines. Get rid of the while-loop entirely.

Update: See also File::Slurp.

Update 2: Here's a test:

c:\@Work\Perl\monks\OldChamp>perl -wMstrict -le "my $s = do { local $/; <>; }; print qq{[[$s]] \n}; ;; my $rx = '{\[%tqu.*]}'; my $su = ''; $s =~ s/$rx/$su/gis; print qq{[[$s]]}; " tqu.txt [[keep this {[%tqu get rid of this]} and keep this too ]] [[keep this and keep this too ]]

Update 3: Update 2 contains a rookie mistake: using greedy  .* instead of the lazy  .*? version. Here's a version that will actually work with a single long string. The previous version would delete everything between the absolute first  {[%tqu and the absolute last  ]} sequence in the file.

c:\@Work\Perl\monks\OldChamp>perl -wMstrict -le "my $s = do { local $/; <>; }; print qq{[[$s]] \n}; ;; my $rx = '{\[%tqu.*?]}'; my $su = ''; $s =~ s/$rx/$su/gis; print qq{[[$s]]}; " tqu.txt [[keep this {[%tqu get rid of this]} and keep this too keep {[%tqu but dump also ]} it to here. ]] [[keep this and keep this too keep it to here. ]]
Actually, something like
    my $regex = qr{ {\[%tqu [^\]]* ]} }xms;
might even be preferable as long as there are guaranteed to be no  ] (right-square-bracket) characters in the sub-strings to be removed, but let's leave it at that for now.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^4: Substitution don't work
by OldChamp (Acolyte) on Aug 25, 2015 at 08:10 UTC

    Now I can understand why some monks do blame me, I have overlooked this reply because there was another reply by an AnomalousMonk which I have read and I wrongly thougt this was the same one. I apologize and hope to do it better in the future. I have read all the other replies and I have tried to understand and to apply the suggestions, but this one, which seemed to be the answer to my problem, I have missed. I got a reply to my second question from poj which was in principle the same solution you propose and it worked, I got rid of the

     {[%tqu .....]}

    and with some additional work from my side now I have solved the problem thanks to the efforts of the monks. Sorry for overlooking your reply and many thanks to all the helpful monks.

      ... there was another reply by an AnomalousMonk which I have read and I wrongly thougt this was the same one.

      I wonder if you were confusing me with the Anonymous Monk. When I chose my Monk handle, I thought it was a cute idea. Now, after many Anomalous/Anonymous confusions, the joke is growing a bit stale. Oh, well...


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1139674]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2022-12-06 00:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?