Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

How to modify my regex?

by OldChamp (Acolyte)
on Aug 26, 2015 at 18:49 UTC ( #1140095=perlquestion: print w/replies, xml ) Need Help??

OldChamp has asked for the wisdom of the Perl Monks concerning the following question:

I want to delete the pattern %17\{#MARKERS.*MARKERS#} in a textfile like this:

[FEN "r2qk2r/ppp3pp/2bp1n2/2b1p3/2Q1P3/5N2/PPP2PPP/RNB2RK1 w kq - 0 10 +"] {%17{#MARKERS - N #B(8/8/8/8/8/8/8/8) #S(8/8/8/2DO1DO3/8/8/8/8) #C( +8/8/8/8/8/8/8/8) #F(8/8/8/8/8/8/8/8) MARKERS#}} 10.Nxe5 {%16I} Bxf2+ 11.Rxf2 {%1610} dxe5 12.Nc3 Qe7 13.Be3 a6 14.Rd1 $16 {Minchev Valentin - Stefanov Stefan, * [FEN "N2q1bk1/pp1b1r1p/3p1nn1/P2Pp3/1P2Pp2/5Pp1/4BBPP/2RQNRK1 b - - 0 +21"] {%17{#MARKERS - N LA4(f6:h5) LA4(h5:g3) #B(8/8/8/8/8/8/8/8) #S(8/8/8/8/8/8/8/8) #C(8/8/8/8/8/8/8/8) #F(8/8/8/8/8/8/8/8) MARKERS +#}} 21...Nh5 {! %16I} 22.Kh1 ( 22.Bxa7 Qh4 {%16I} 23.h3 Bxh3 {%16I} 24.gxh3 Qxh3 {%16I} 25.Rf2 gxf2+ {%16I} 26.Kxf2 Nh4 27.Bf1 ( 27.Nc2 Rg7 28.Ke1 Rg2 29.Qd3 Ng3 $19 ) Qh2+ {%16I} 28.Ng2 Rg7 $17 {%16I} ) gxf2 {%16I} ( 22...Qh4 {?} 23.Bg1 ) 23.Rxf2 Ng3+ {! %16I} 24.Kg1 ( 24.hxg3 fxg3 $19 ) Qxa8 25.Bc4 {%17{#MARKERS - N LA4 +(a7:g1) #B(8/8/8/8/8/8/8/8) #S(8/8/8/8/8/8/8/8) #C(8/8/8/8/8/8/8/8 +) #F(8/8/8/8/8/8/8/8) MARKERS#}} a6 {! ('with the idea'Qa7 'diagonals'a7-g1) %1680} 26.Qd3 ( 26.hxg3 fxg3 27.Rb2 Qd8 28.Kf1 ( 28.f4 Rxf4 29.Qd3 Bh6 30.Rcb1 ( 30.Rc3 Qh4 31.Qxg3 Qxg3 32.Rxg3 Rxe4 33.Be2 Bf4 $17 ) b5 31.Bb3 Qc7 32.Nf3 Nh4 ) Bh6 29.Ke2 Qg5 $17 ) Qa7 {%16I} 27.b5 ( 27.Rcc2 {%17{#MARKERS - N LA4(f8:e7) LA4(e7:h4) #B(8/8/8/8/8/8/8/8) #S(8/8/8/8/8/8/8/8) #C(8/8/8/8/8/8/8/8) #F(8/8/8/8/8/8/8/8) MARKERS +#}} Be7 {'with the idea'Bh4} $19 {%16I} ) axb5 {%16I} 28.Bxb5 Nh1 {! 0-1, Piket Jeroen - Kasparov Gary, Tilburg 1989 It %1620} *

I have tried it with this code

#!/usr/bin/perl # Remove extra text from a PGN or Text-file. # Aufruf: perl removeEK4.pl In.txt > Out.txt use strict; use warnings; my $regex = '%17\{#MARKERS.*MARKERS#}'; my $line = do { local $/; <>; }; $line =~ s/\n/ /g; $line =~ s/$regex/ /gi; $line =~ s/ (\d+\. |\[)/\n$1/g; print $line;

What I wanted to get was this:

[FEN "r2qk2r/ppp3pp/2bp1n2/2b1p3/2Q1P3/5N2/PPP2PPP/RNB2RK1 w kq - 0 10 +"] 10.Nxe5 {%16I} Bxf2+ 11.Rxf2 {%1610} dxe5 12.Nc3 Qe7 13.Be3 a6 14.Rd1 +$16 {Minchev Valentin - Stefanov Stefan, * [FEN "N2q1bk1/pp1b1r1p/3p1nn1/P2Pp3/1P2Pp2/5Pp1/4BBPP/2RQNRK1 b - - 0 +21"] 21...Nh5 {! %16I} 22.Kh1 ( 22.Bxa7 Qh4 {%16I} 23.h3 Bxh3 {%16I} 24.gxh +3 Qxh3 {%16I} 25.Rf2 gxf2+ {%16I} 26.Kxf2 Nh4 27.Bf1 ( 27.Nc2 Rg7 28. +Ke1 Rg2 29.Qd3 Ng3 $19 ) Qh2+ {%16I} 28.Ng2 Rg7 $17 {%16I} ) gxf2 {%1 +6I} ( 22...Qh4 {?} 23.Bg1 ) 23.Rxf2 Ng3+ {! %16I} 24.Kg1 ( 24.hxg3 fx +g3 $19 ) Qxa8 25.Bc4 a6 {! ('with the idea'Qa7 'diagonals'a7-g1) %16 +80} 26.Qd3 ( 26.hxg3 fxg3 27.Rb2 Qd8 28.Kf1 ( 28.f4 Rxf4 29.Qd3 Bh6 3 +0.Rcb1 ( 30.Rc3 Qh4 31.Qxg3 Qxg3 32.Rxg3 Rxe4 33.Be2 Bf4 $17 ) b5 31. +Bb3 Qc7 32.Nf3 Nh4 ) Bh6 29.Ke2 Qg5 $17 ) Qa7 {%16I} 27.b5 ( 27.Rcc2 + Be7 {'with the idea'Bh4} $19 {%16I} ) axb5 {%16I} 28.Bxb5 Nh1 {! 0-1 +, Piket Jeroen - Kasparov Gary, Tilburg 1989 It %1620} *

but what I really get is this:

[FEN "r2qk2r/ppp3pp/2bp1n2/2b1p3/2Q1P3/5N2/PPP2PPP/RNB2RK1 w kq - 0 10"]  { } Be7 {'with the idea'Bh4} $19 {%16I} ) axb5 {%16I} 28.Bxb5 Nh1 {! 0-1, Piket Jeroen - Kasparov Gary, Tilburg 1989 It %1620} *

I have asked an online regextester and it showed me, that my regex finds the first occurence of the pattern but then takes everything till the last MARKERS#} What is wrong with my regex?

Replies are listed 'Best First'.
Re: How to modify my regex?
by AnomalousMonk (Bishop) on Aug 26, 2015 at 19:05 UTC
    What is wrong with my regex?

    It's too greedy. Make  .* less greedy by adding the  ? "lazy" modifier to the  * "zero-or-more" quantifier to make it  .*? instead.

    Update: See Quantifiers in perlre; also see discussions of greedy/lazy matching in perlretut. (Update: See also Quantifiers in regular expressions in Tutorials.)

    Update 2: Example:

    c:\@Work\Perl\monks\OldChamp>perl -wMstrict -le "my $s = 'keep me ZIP delete this ZAP also keep this ZIP kill too ZAP +keep too'; print qq{'$s'}; ;; (my $t = $s) =~ s{ ZIP .* ZAP }{}xmsg; print qq{greedy .*: '$t'}; ;; ($t = $s) =~ s{ ZIP .*? ZAP }{}xmsg; print qq{lazy .*?: '$t'}; " 'keep me ZIP delete this ZAP also keep this ZIP kill too ZAP keep too' greedy .*: 'keep me keep too' lazy .*?: 'keep me also keep this keep too'


    Give a man a fish:  <%-{-{-{-<

      Hi, AnomalousMonk, many thanks for helping me again, now my program works. It's really astonishing what this "lazy" modifier can change! I have read about greedy and lazy, but because of the lack of an example, I have not really understood what this means. Now I think I have at least an idea of this modifier and tomorrow I will visit the links you have supplied and hopefully I will become a little more knowlegeable.

        You're very welcome. It may seem a bit churlish to bring this up after your gracious thanks, but it dawned on me that the subject of the OP was familiar, and I was right: Re^3: Substitution don't work discusses a similar problem. Updates 2 & 3 there give contrasting examples of  .* versus  .*? (greedy versus lazy) behavior. (Also see the discussion there of the effect of the  /s regex modifier on the behavior of the  . (dot) metacharacter when matching against a multi-line string.)

        These comments are not intended to give you a hard time, but to suggest that the advice of the humble Monks will be of more value to you the more attention you pay it.


        Give a man a fish:  <%-{-{-{-<

Re: How to modify my regex?
by dave_the_m (Monsignor) on Aug 26, 2015 at 20:24 UTC
    Also, stop putting your regex pattern in a literal string. i.e. replace
    my $regex = 'PATTERN'; s/$regex/.../;
    with
    s/PATTERN/.../;
    or
    my $regex = qr/PATTERN/; s/$regex/.../;
    Otherwise you're soon going to run into the different quoting rules between literal strings and literal patterns; e.g. "\bfoo" and qr/\bfoo/ mean two different things.
Re: How to modify my regex?
by Laurent_R (Canon) on Aug 26, 2015 at 20:25 UTC
    AnomalousMonk said it all, but just to put it in other terms, you want your regex to stop at the first occurrence of the regex end delimiter (I mean the end of the pattern), rather that the last one.

    So simply:

    my $regex = '%17\{#MARKERS.*?MARKERS#}';
    With the ? qualifier, the regex engine is no longer going to match as much as possible, i.e. up to the very last occurrence of the MARKERS#} end delimiter, but will be happy to stop at the first one, which is what you need here. Actually, when quantifying . any character symbol, the .* or the .+ pattern is relatively rarely what you want, quite often .*? and .+? are better. But that's not a general rule, there also cases when you want to match up to the end of the last regex end delimiter.

      Hi Laurant_R, thank you for your clear explanation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1140095]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2022-10-06 02:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred way to holiday/vacation is:











    Results (26 votes). Check out past polls.

    Notices?