http://www.perlmonks.org?node_id=1148066

OldChamp has asked for the wisdom of the Perl Monks concerning the following question:

Sorry for obviously not describing my problem clear enough, so here is the updated Question.

I can use perl 5.010 and there is no problem in reading the complete inputfile in memory.

I have text in a file of the following form, where the dots stands for a random amount of other text. After reading the input-file, all the newline characters are removed and my text is in $line.

.....P1p1/6P1/7P/5PK1/8 w - - 4 34"] {Weiß am Zug} * ..... 1n5P/1P4PK/1q6 b - - 2 42"] {Weiß am Zug} * ..... .......5K2/3R4 w - - 1 33"] {Weiß am Zug} * .....

If I have a space, followed by the letter b, followed by another space, as shown here in the third line, I want to substitute only the following 'Weiss' with 'Schwarz' and then continue to go through the text until there is another occurence of the pattern (space, followed by the letter b, followed by another space), when the next substition should follow.

So the core of the problem is: If I find a certain pattern-1 in the text, then go on till I find a certain pattern-2 and substitute this with a pattern-3. Do this again for the whole text. I have tried the following:

my $line = do { local $/; <>; }; $line =~ s/\n/ /g; # if ( $line =~ m/\sb\s/ ) # { # $line =~ s/Weiss/Schwarz/g; # } $line =~ s/(?<=\sb\s)Weiß/Schwarz/g; ... print $line;

but neither of this gives me the desired result, the text didn't change. How to solve this problem?

Replies are listed 'Best First'.
Re: Regex with condition
by jeffa (Bishop) on Nov 18, 2015 at 22:54 UTC

    This was asked and answered just yesterday ... er 2 days ago: y/·/./ gives two points instead of one.

    You need to add use utf8;

    use utf8; my $str = ' {Weiß am Zug} *'; $str =~ s/Weiß/Schwarz/g; print $str;

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Regex with condition
by AnomalousMonk (Archbishop) on Nov 18, 2015 at 23:55 UTC

    If

    • you want to substitute  'Weis' with  'Schwarz' (I will use only latin characters — simpler for me) only in those lines that immediately follow a line with a  'b' in it, and
    • (update: a  'b' never occurs in a line that has a  'Weis' you want to change, and)
    • you can slurp the entire file into memory at once, and
    • your Perl version is 5.10 or greater (for the  \K operator),
    here's a way:
    c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; my $t = qq{ .....P1p1/6P1/7P/5PK1/8 w - - 4 34]\n} . qq{ {Weis am Zug} * .....\n} . qq{1n5P/1P4PK/1q6 b - - 2 42]\n} . qq{ {Weis am Zug} * .....\n} . qq{.......5K2/3R4 w - - 1 33]\n} . qq{ {Weis am Zug} *\n} ; print qq{[[$t]]}; ;; my $pre = qr{ \S+ \s }xms; my $post = qr{ \s [^\n]+ \n [^\{]+ \{ }xms; ;; $t =~ s{ ^ $pre b $post \K Weis \b} {Schwarz}xmsg; print qq{[[$t]]}; " [[ .....P1p1/6P1/7P/5PK1/8 w - - 4 34] {Weis am Zug} * ..... 1n5P/1P4PK/1q6 b - - 2 42] {Weis am Zug} * ..... .......5K2/3R4 w - - 1 33] {Weis am Zug} * ]] [[ .....P1p1/6P1/7P/5PK1/8 w - - 4 34] {Weis am Zug} * ..... 1n5P/1P4PK/1q6 b - - 2 42] {Schwarz am Zug} * ..... .......5K2/3R4 w - - 1 33] {Weis am Zug} * ]]
    (Also note that I left out all the  " (double-quote) characters — they confuse my REPL.) If your Perl version is pre-5.10, let me know; there's a simple work-around.

    Update: If your file is too large to fit entirely in memory and you must process line-by-line in a while-loop, your code might look something like this (untested):

    use 5.010; ... my $pre_b = qr{ ... }xms; my $pre_weis = qr{ ... }xms; my $b_was_seen; while (my $line = <$input_filehandle>) { # check for trigger condition/substitution pattern sequence. # ASSUME: 'b' and 'Weis' cannot both occur in same line. if ($line =~ m{ \A $pre_b b }xms) { # flag 'b' was seen in this line. $b_was_seen = 1; } elsif ($b_was_seen) { # attempt substitution if 'b' seen in previous line. $line =~ s{ \A $pre_weis \K Weis }{Schwarz}xms; $b_was_seen = 0; } # write each (possibly altered) line to output file. print $output_filehandle $line; }
    (Again, this assumes your Perl version is 5.10+.) The if-test could also be done, perhaps more concisely, with a  .. flip-flop operator (see Range Operators in scalar context in perlop), but verbosity may be a virtue here.


    Give a man a fish:  <%-{-{-{-<

Re: Regex with condition
by AnomalousMonk (Archbishop) on Nov 19, 2015 at 13:19 UTC

    Further to your updated OP:

    Why are you substituting newlines with spaces? The first approach given in my reply above, the one based on slurping the entire file (as you seem now to be doing), depends on having all newlines preserved. If your final text needs to be without newlines, I think the best time to get rid of them would be after all changes to the text have been made because those changes are all defined (or were defined in the original Original Post) in terms "lines", i.e., with reference to newlines.

    However, also note that if the "random amount of other text" preceding the  'b' in a trigger line may contain any whitespace, then the
        my $pre = qr{ \S+ \s }xms;
    regex will have to be changed accordingly.

    Update: When updating a post, please do not destroy original content; instead, indicate defunct content as such (e.g., with  <strike> ... </STRIKE> tags or in a brief note). Likewise, indicate added content in some way. The Golden Rule: Thou shalt not destroy the context of previous replies. Please see How do I change/delete my post?.


    Give a man a fish:  <%-{-{-{-<

      Thank You for your help and patience and replying again. I have to excuse for my formal mistakes in the update, I should have read "How do I change ..." before. I have changed / replaced my script with your solution and I have created a small testfile

      This is my script now:

      my $t = do { local $/; <>; }; my $pre = qr{ \S+ \s }xms; my $post = qr{ \s [^\n]+ \n [^\{]+ \{ }xms; ;; $t =~ s{ ^ $pre b $post \K Weiss \b} {Schwarz}xmsg; print qq{$t};

      This was the input:

      .......... [FEN "1B3k2/2R1b3/4p1pp/2r1P1p1/6P1/7P/5PK1/8 w - - 4 34"] {Weiss am Zug} * .......... [FEN "1B6/1Q4bk/R5p1/1pp4p/8/1n5P/1P4PK/1q6 b - - 2 42"] {Weiss am Zug} * ........... [FEN "1Bn4r/1k3q2/1pp1bbp1/1N1p1p1r/Q2Pp3/6PN/PP2PPB1/K1R4R b - - 0 23 +"] {Weiss am Zug} * ........... [FEN "1K6/4kp2/1P4p1/7p/7P/1r3PP1/1p6/1R6 w - - 3 52"] {Weiss am Zug} *

      The output was exactly the same, Weiss was not changed.

        In my reply above, I wrote

        However, also note that if the "random amount of other text" preceding the  'b' in a trigger line may contain any whitespace, then the
            my $pre = qr{ \S+ \s }xms;
        regex will have to be changed accordingly.
        In the OP, a typical  'b' line was
            '1n5P/1P4PK/1q6 b - - 2 42"]'
        In the post to which this is a reply, a typical  'b' line is
            '[FEN "1B6/1Q4bk/R5p1/1pp4p/8/1n5P/1P4PK/1q6 b - - 2 42"]'
        with more than one whitespace character before the 'b'. Accordingly, I have altered the $pre regex to
            my $pre  = qr{ \S+ \s \S+ \s }xms;
        (This is not what I would consider a "robust" regex, but you will have to provide a regex that best matches your data if you wish an improvement.)

        In addition, in the OP a typical  'Weiss' line had leading whitespace; in the most recent post, it has none. Accordingly, I have altered the $post regex to
            my $post = qr{ \s [^\n]+ \n [^\{]* \{ }xms;
        (note [^\{]* vice [^\{]+).

        I have downloaded your latest data to my file 1148125.dat. With the changes above and your latest data:

        c:\@Work\Perl\monks\OldChamp>perl -wMstrict -e "use 5.010; ;; my $t = do { local $/; <>; }; print qq{[[$t]] \n\n}; ;; my $pre = qr{ \S+ \s \S+ \s }xms; my $post = qr{ \s [^\n]+ \n [^\{]* \{ }xms; ;; $t =~ s{ ^ $pre b $post \K Weiss \b } {Schwarz}xmsg; ;; print qq{<<$t>> \n\n}; " 1148125.dat [[.......... [FEN "1B3k2/2R1b3/4p1pp/2r1P1p1/6P1/7P/5PK1/8 w - - 4 34"] {Weiss am Zug} * .......... [FEN "1B6/1Q4bk/R5p1/1pp4p/8/1n5P/1P4PK/1q6 b - - 2 42"] {Weiss am Zug} * ........... [FEN "1Bn4r/1k3q2/1pp1bbp1/1N1p1p1r/Q2Pp3/6PN/PP2PPB1/K1R4R b - - 0 23 +"] {Weiss am Zug} * ........... [FEN "1K6/4kp2/1P4p1/7p/7P/1r3PP1/1p6/1R6 w - - 3 52"] {Weiss am Zug} * ]] <<.......... [FEN "1B3k2/2R1b3/4p1pp/2r1P1p1/6P1/7P/5PK1/8 w - - 4 34"] {Weiss am Zug} * .......... [FEN "1B6/1Q4bk/R5p1/1pp4p/8/1n5P/1P4PK/1q6 b - - 2 42"] {Schwarz am Zug} * ........... [FEN "1Bn4r/1k3q2/1pp1bbp1/1N1p1p1r/Q2Pp3/6PN/PP2PPB1/K1R4R b - - 0 23 +"] {Schwarz am Zug} * ........... [FEN "1K6/4kp2/1P4p1/7p/7P/1r3PP1/1p6/1R6 w - - 3 52"] {Weiss am Zug} * >>
        At last,  Weiss has changed.


        Give a man a fish:  <%-{-{-{-<

Re: Regex with condition
by Laurent_R (Canon) on Nov 19, 2015 at 08:18 UTC
    I can't test right now on my mobile device, but something like this should presumablly work:
    while (my $line = <$IN> ) { $line =~ s/Weiß/Schwarz/g if /\sb\s/ .. (not /\sb\s/);