Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Substitution don't work

by OldChamp (Acolyte)
on Aug 24, 2015 at 11:49 UTC ( #1139666=note: print w/replies, xml ) Need Help??


in reply to Re: Substitution don't work
in thread Substitution don't work

Hi James, thank you for your quick help. I've tried your solution and it works. I'm new to perl, so in this first attempt I have only used a tiny bit of the real data. So I have modified your solution a little bit to be able to work with an inputfile and an outputfile.

The modified code is

# Aufruf: perl removeEK1.pl TestEK.txt > Out.txt use strict; use warnings; my $regex = '\{\[%tqu.*]}'; my $subst = ''; while(<>) { my $line =$_; $line =~ s/$regex/$subst/gi; print $line; }

That worked fine with the following file TestEK.txt:

[Event "?"] [Site "?"] [Date "1985.??.??"] [Round "?"] [White "Neuenschwander, Beat"] [Black "?"] [Result "1-0"] [Annotator "Solution"] [SetUp "1"] [FEN "8/5ppk/8/3p2KP/3P2P1/8/8/8 w - - 0 1"] [PlyCount "17"] [Source "ChessCafe/CB"] [SourceDate "2003.10.29"] BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) + {[%tqu "What is White's next move?","","",g3,"",0,b6,"misses the wi +n:",0]} 2. g3 $1 13. g6 c3 BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) + {[%tqu "What is White's next move?","","",g3,"",0,b6,"misses the wi +n:",0]} 2. g3 $1 13. g6 c31. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1 +... Kb8 (1... f4 2. b6 $18) {[%tqu "What is White's next move?",""," +",g3,"",0,b6,"misses the win:",0]} 2. g3 $1 13. g6 c31. Ka6 ({Of cour +se not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) {[%tqu "What i +s White's next move?","","",g3,"",0,b6,"misses the win:",0]} 2. g3 $ +1 13. g6 c3 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) + {[%tqu "What is White's next move?","","",g3,"",0,b6,"misses the wi +n:",0]} 2. g3 $1 13. g6 c3 BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaB +laBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla

I got the output I wanted, but when I tested it with a part of my real inputfile, it failed.

....... [Event "?"] [Site "?"] [Date "1933.??.??"] [Round "?"] [White "Grigoriev, Nikolay"] [Black "?"] [Result "*"] [Annotator "Solution"] [SetUp "1"] [FEN "k7/2p5/8/KP3p2/8/8/6P1/8 w - - 0 1"] [PlyCount "13"] [Source "ChessCafe/CB"] [SourceDate "2003.10.29"] 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) {[%tqu "Wha +t is White's next move?","","",g3,"",0,b6,"misses the win:",0]} 2. g3 $1 ({ +The hasty } 2. b6 $2 {misses the win:} Kc8 $1 {with the idea 3...cxb6.} 3. b7+ K +b8 4. g3 c5 5. Kb5 Kxb7 6. Kxc5 Kc7 7. Kd5 f4 $1 8. gxf4 Kd7 $11 {Black saves t +he game by seizing the opposition.}) 2... Ka8 ({Another defensive method also +does not help} 2... Kc8 3. Ka7 Kd8 4. Kb8 $1 {(an opposition!)} Kd7 5. Kb7 Kd8 +(5... Kd6 6. Kc8 $18) 6. Kc6 {(an outflanking!)} Kc8 7. Kd5 Kb7 8. Ke5 Kb6 9. Kx +f5 Kxb5 10. g4 c5 11. g5 c4 12. Ke4 $1 {(we shall see this method - an enticem +ent of the hostile king under a check - more than once in this book)} Kb4 13. + g6 c3 14. Kd3 $1 Kb3 15. g7 c2 16. g8=Q+) {[%tqu "What is White's next move? +","","", b6,"",0]} 3. b6 Kb8 { } 4. Kb5 $1 (4. b7 $2 c5 5. Kb5 Kxb7 $11) 4... Kb7 5. bxc7 Kxc7 {[%tqu + "What is White's next move?", "","",Kc5,"",0]} 6. Kc5 Kd7 {[%tqu "What is White's next move?","","", +Kd5, "This time White has seized the opposition, therefore the pawn sacrifi +ce 7... f4 is senseless.",0]} 7. Kd5 $18 {This time White has seized the oppos +ition, therefore the pawn sacrifice 7...f4 is senseless.} * [Event "?"] [Site "?"] .......

My outputfile now was the same as the inputfile, the searchtext was not removed!! While in my first attempt I made relatively simple errors, now I think it's difficult and I'm not able to spot what is going wrong. Pherhaps you have an idea?

Replies are listed 'Best First'.
Re^3: Substitution don't work
by AnomalousMonk (Bishop) on Aug 24, 2015 at 12:53 UTC

    In your TestEK.txt file, you have all  {[%tqu ... ]} sequences on the same line. In the second, real file, these sequences span two or more lines.

    You are processing the file line-by-line and matching your regex against each line, so if a  {[%tqu ... ]} sequence spans multiple lines, the regex will never see it.

    If the file is small, less than, say, several hundred megabytes and never likely to grow larger, it might be easiest to "slurp" the entire file at once as a string into a scalar variable and then do a single  s/// against the variable, then write the string back out to the new file.
        my $string = do { local $/;  <>; };
        $string =~ s/$regex/$subst/gis;
        print $string;
    (untested). Note that the  s/// now needs a  /s regex modifier so that  . (dot) in  .* will match a newline across multiple lines. Get rid of the while-loop entirely.

    Update: See also File::Slurp.

    Update 2: Here's a test:

    c:\@Work\Perl\monks\OldChamp>perl -wMstrict -le "my $s = do { local $/; <>; }; print qq{[[$s]] \n}; ;; my $rx = '{\[%tqu.*]}'; my $su = ''; $s =~ s/$rx/$su/gis; print qq{[[$s]]}; " tqu.txt [[keep this {[%tqu get rid of this]} and keep this too ]] [[keep this and keep this too ]]

    Update 3: Update 2 contains a rookie mistake: using greedy  .* instead of the lazy  .*? version. Here's a version that will actually work with a single long string. The previous version would delete everything between the absolute first  {[%tqu and the absolute last  ]} sequence in the file.

    c:\@Work\Perl\monks\OldChamp>perl -wMstrict -le "my $s = do { local $/; <>; }; print qq{[[$s]] \n}; ;; my $rx = '{\[%tqu.*?]}'; my $su = ''; $s =~ s/$rx/$su/gis; print qq{[[$s]]}; " tqu.txt [[keep this {[%tqu get rid of this]} and keep this too keep {[%tqu but dump also ]} it to here. ]] [[keep this and keep this too keep it to here. ]]
    Actually, something like
        my $regex = qr{ {\[%tqu [^\]]* ]} }xms;
    might even be preferable as long as there are guaranteed to be no  ] (right-square-bracket) characters in the sub-strings to be removed, but let's leave it at that for now.


    Give a man a fish:  <%-{-{-{-<

      Now I can understand why some monks do blame me, I have overlooked this reply because there was another reply by an AnomalousMonk which I have read and I wrongly thougt this was the same one. I apologize and hope to do it better in the future. I have read all the other replies and I have tried to understand and to apply the suggestions, but this one, which seemed to be the answer to my problem, I have missed. I got a reply to my second question from poj which was in principle the same solution you propose and it worked, I got rid of the

       {[%tqu .....]}

      and with some additional work from my side now I have solved the problem thanks to the efforts of the monks. Sorry for overlooking your reply and many thanks to all the helpful monks.

        ... there was another reply by an AnomalousMonk which I have read and I wrongly thougt this was the same one.

        I wonder if you were confusing me with the Anonymous Monk. When I chose my Monk handle, I thought it was a cute idea. Now, after many Anomalous/Anonymous confusions, the joke is growing a bit stale. Oh, well...


        Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1139666]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (1)
As of 2022-10-02 12:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred way to holiday/vacation is:











    Results (8 votes). Check out past polls.

    Notices?