Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Substitution don't work

by OldChamp (Acolyte)
on Aug 24, 2015 at 01:01 UTC ( #1139611=perlquestion: print w/replies, xml ) Need Help??

OldChamp has asked for the wisdom of the Perl Monks concerning the following question:

I want to remove some text from which I know the starting characters and the 2 ending characters. The text I want to remove is part of a textfile TestEK.txt and the cleaned text should be in a new textfile Out.txt. My regex worked in an online regex tester, but not in my perlfile and I can not find out why.

#!/usr/bin/perl -w # Remove extra text from a PGN or Text-file. # Aufruf: perl remove2.pl TestEK.txt > Out.txt # This is TestEK.txt: # 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $1 +8) {[%tqu "What is White's next move?","","",g3,"",0,b6,"misses the +win:",0]} 2. g3 $1 13. g6 c3 # This should be the Out.txt # 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $1 +8) } 2. g3 $1 13. g6 c3 use strict; my $regex =/\{\[%tqu .* ]}/; my $subst = //; while(<>) { my $line = $_; s/$regex/$subst/gi; }

Replies are listed 'Best First'.
Re: Substitution don't work
by davido (Cardinal) on Aug 24, 2015 at 01:13 UTC

    In your sample input there is no space between 0 and ]. In your regular expression you have the pattern .* ] (there's a space). Did you maybe need to put a /x modifier? Or eliminate that space in the pattern.

    Also, this line: my $subst = // is not doing what you think. It is doing a match of $_ =~ // in scalar context and assigning the Boolean result to $subst. I think you probably wanted my $subst = '';, or my $subst = q//;.


    Dave

      ... my $subst = // is not doing what you think. It is doing a match of $_ =~ // in scalar context ...

      Likewise
          my $regex =/\{\[%tqu .* ]}/;

      OldChamp: You seem to have warnings enabled, but if so, you should have posted the warning messages generated.


      Give a man a fish:  <%-{-{-{-<

      Hi Dave, thank you for your help. I have removed the space and I have corrected the $subst to my $subst ='';, but I still get an empty file Out.txt.

      I'm just learning perl, I have programed in Visual Basic and VBA long long ago.

        but I still get an empty file Out.txt.

        You should print if you want your program to produce some output.

Re: Substitution don't work
by james28909 (Deacon) on Aug 24, 2015 at 02:00 UTC

    The main problem is in your regex declaration, you had / in there and it was being read as a regex and throwing errors about $regex not being initiated. as s//$regex///$subst//gi I think. Here is the error I was getting:

    Use of uninitialized value $_ in pattern match (m//) at C:\User.....

    Heres a solution I came up with:

    use strict; use warnings; my $regex = '\{\[%tqu.*]}'; my $subst = ''; while(my $line = <DATA>) { $line =~ s/$regex/$subst/gi; print $line."\n"; } __DATA__ 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) + {[%tqu "What is White's next move?","","",g3,"",0,b6,"misses the wi +n:",0]} 2. g3 $1 13. g6 c3

    Outputs:

    1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) + 2. g3 $1 13. g6 c3

      Hi James, thank you for your quick help. I've tried your solution and it works. I'm new to perl, so in this first attempt I have only used a tiny bit of the real data. So I have modified your solution a little bit to be able to work with an inputfile and an outputfile.

      The modified code is

      # Aufruf: perl removeEK1.pl TestEK.txt > Out.txt use strict; use warnings; my $regex = '\{\[%tqu.*]}'; my $subst = ''; while(<>) { my $line =$_; $line =~ s/$regex/$subst/gi; print $line; }

      That worked fine with the following file TestEK.txt:

      [Event "?"] [Site "?"] [Date "1985.??.??"] [Round "?"] [White "Neuenschwander, Beat"] [Black "?"] [Result "1-0"] [Annotator "Solution"] [SetUp "1"] [FEN "8/5ppk/8/3p2KP/3P2P1/8/8/8 w - - 0 1"] [PlyCount "17"] [Source "ChessCafe/CB"] [SourceDate "2003.10.29"] BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) + {[%tqu "What is White's next move?","","",g3,"",0,b6,"misses the wi +n:",0]} 2. g3 $1 13. g6 c3 BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) + {[%tqu "What is White's next move?","","",g3,"",0,b6,"misses the wi +n:",0]} 2. g3 $1 13. g6 c31. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1 +... Kb8 (1... f4 2. b6 $18) {[%tqu "What is White's next move?",""," +",g3,"",0,b6,"misses the win:",0]} 2. g3 $1 13. g6 c31. Ka6 ({Of cour +se not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) {[%tqu "What i +s White's next move?","","",g3,"",0,b6,"misses the win:",0]} 2. g3 $ +1 13. g6 c3 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) + {[%tqu "What is White's next move?","","",g3,"",0,b6,"misses the wi +n:",0]} 2. g3 $1 13. g6 c3 BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla BlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaB +laBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBlaBla

      I got the output I wanted, but when I tested it with a part of my real inputfile, it failed.

      ....... [Event "?"] [Site "?"] [Date "1933.??.??"] [Round "?"] [White "Grigoriev, Nikolay"] [Black "?"] [Result "*"] [Annotator "Solution"] [SetUp "1"] [FEN "k7/2p5/8/KP3p2/8/8/6P1/8 w - - 0 1"] [PlyCount "13"] [Source "ChessCafe/CB"] [SourceDate "2003.10.29"] 1. Ka6 ({Of course not} 1. b6 $2 Kb7 $11) 1... Kb8 (1... f4 2. b6 $18) {[%tqu "Wha +t is White's next move?","","",g3,"",0,b6,"misses the win:",0]} 2. g3 $1 ({ +The hasty } 2. b6 $2 {misses the win:} Kc8 $1 {with the idea 3...cxb6.} 3. b7+ K +b8 4. g3 c5 5. Kb5 Kxb7 6. Kxc5 Kc7 7. Kd5 f4 $1 8. gxf4 Kd7 $11 {Black saves t +he game by seizing the opposition.}) 2... Ka8 ({Another defensive method also +does not help} 2... Kc8 3. Ka7 Kd8 4. Kb8 $1 {(an opposition!)} Kd7 5. Kb7 Kd8 +(5... Kd6 6. Kc8 $18) 6. Kc6 {(an outflanking!)} Kc8 7. Kd5 Kb7 8. Ke5 Kb6 9. Kx +f5 Kxb5 10. g4 c5 11. g5 c4 12. Ke4 $1 {(we shall see this method - an enticem +ent of the hostile king under a check - more than once in this book)} Kb4 13. + g6 c3 14. Kd3 $1 Kb3 15. g7 c2 16. g8=Q+) {[%tqu "What is White's next move? +","","", b6,"",0]} 3. b6 Kb8 { } 4. Kb5 $1 (4. b7 $2 c5 5. Kb5 Kxb7 $11) 4... Kb7 5. bxc7 Kxc7 {[%tqu + "What is White's next move?", "","",Kc5,"",0]} 6. Kc5 Kd7 {[%tqu "What is White's next move?","","", +Kd5, "This time White has seized the opposition, therefore the pawn sacrifi +ce 7... f4 is senseless.",0]} 7. Kd5 $18 {This time White has seized the oppos +ition, therefore the pawn sacrifice 7...f4 is senseless.} * [Event "?"] [Site "?"] .......

      My outputfile now was the same as the inputfile, the searchtext was not removed!! While in my first attempt I made relatively simple errors, now I think it's difficult and I'm not able to spot what is going wrong. Pherhaps you have an idea?

        In your TestEK.txt file, you have all  {[%tqu ... ]} sequences on the same line. In the second, real file, these sequences span two or more lines.

        You are processing the file line-by-line and matching your regex against each line, so if a  {[%tqu ... ]} sequence spans multiple lines, the regex will never see it.

        If the file is small, less than, say, several hundred megabytes and never likely to grow larger, it might be easiest to "slurp" the entire file at once as a string into a scalar variable and then do a single  s/// against the variable, then write the string back out to the new file.
            my $string = do { local $/;  <>; };
            $string =~ s/$regex/$subst/gis;
            print $string;
        (untested). Note that the  s/// now needs a  /s regex modifier so that  . (dot) in  .* will match a newline across multiple lines. Get rid of the while-loop entirely.

        Update: See also File::Slurp.

        Update 2: Here's a test:

        c:\@Work\Perl\monks\OldChamp>perl -wMstrict -le "my $s = do { local $/; <>; }; print qq{[[$s]] \n}; ;; my $rx = '{\[%tqu.*]}'; my $su = ''; $s =~ s/$rx/$su/gis; print qq{[[$s]]}; " tqu.txt [[keep this {[%tqu get rid of this]} and keep this too ]] [[keep this and keep this too ]]

        Update 3: Update 2 contains a rookie mistake: using greedy  .* instead of the lazy  .*? version. Here's a version that will actually work with a single long string. The previous version would delete everything between the absolute first  {[%tqu and the absolute last  ]} sequence in the file.

        c:\@Work\Perl\monks\OldChamp>perl -wMstrict -le "my $s = do { local $/; <>; }; print qq{[[$s]] \n}; ;; my $rx = '{\[%tqu.*?]}'; my $su = ''; $s =~ s/$rx/$su/gis; print qq{[[$s]]}; " tqu.txt [[keep this {[%tqu get rid of this]} and keep this too keep {[%tqu but dump also ]} it to here. ]] [[keep this and keep this too keep it to here. ]]
        Actually, something like
            my $regex = qr{ {\[%tqu [^\]]* ]} }xms;
        might even be preferable as long as there are guaranteed to be no  ] (right-square-bracket) characters in the sub-strings to be removed, but let's leave it at that for now.


        Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1139611]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2022-10-01 04:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (126 votes). Check out past polls.

    Notices?