Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: 1000000th question about regex lol (updated)

by AnomalousMonk (Archbishop)
on Dec 07, 2019 at 11:55 UTC ( [id://11109801]=note: print w/replies, xml ) Need Help??


in reply to 1000000th question about regex lol

It's hard to tell just what your requirements are from the posted code and of course there's no test set (please see How to ask better questions using Test::More and sample data), but here's a regex-based attempt. More degenerate and edge/corner-case test cases are needed.
File replace_1.pl:

use 5.010; # needs // (defined-or), \K regex extension use strict; use warnings; use Test::More 'no_plan'; use Test::NoWarnings; use Data::Dump qw(dd); # for debug my @Tests = ( 'replace ALL instances of search string (left-to-right)', [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', undef, 'ABC-@BC CBA-@CB CAB BAC CAB CAB CARAB CACBA-@XCK', 'unlimited replacements', ], [ 'ABABA', 'A', 'xx', undef, 'xxBxxBxx', 'unlimited replacements', ] +, [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', 3, 'ABC-@BC CBA-@CB CAB BAC CAB CAB CARAB CACBA-@XCK', ], [ 'ABABA', 'A', 'xx', 3, 'xxBxxBxx', 'unlimited replacements', ], [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', 4, 'ABC-@BC CBA-@CB CAB BAC CAB CAB CARAB CACBA-@XCK', ], [ 'ABABA', 'A', 'xx', 4, 'xxBxxBxx', 'unlimited replacements', ], [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', 99, 'ABC-@BC CBA-@CB CAB BAC CAB CAB CARAB CACBA-@XCK', ], [ 'ABABA', 'A', 'xx', 99, 'xxBxxBxx', 'unlimited replacements', ], 'replace ALL instances of search string (right-to-left)', [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', -3, 'ABC-@BC CBA-@CB CAB BAC CAB CAB CARAB CACBA-@XCK', ], [ 'ABABA', 'A', 'xx', -3, 'xxBxxBxx', 'unlimited replacements', ], [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', -4, 'ABC-@BC CBA-@CB CAB BAC CAB CAB CARAB CACBA-@XCK', ], [ 'ABABA', 'A', 'xx', -4, 'xxBxxBxx', 'unlimited replacements', ], [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', -99, 'ABC-@BC CBA-@CB CAB BAC CAB CAB CARAB CACBA-@XCK', ], [ 'ABABA', 'A', 'xx', -99, 'xxBxxBxx', 'unlimited replacements', ], 'replace N instances of search string (left-to-right)', [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', 0, 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ], [ 'ABABA', 'A', 'xx', 0, 'ABABA', ], [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', 1, 'ABC-@BC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ], [ 'ABABA', 'A', 'xx', 1, 'xxBABA', ], [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', 2, 'ABC-@BC CBA-@CB CAB BAC CAB CAB CARAB CACBA AXCK', ], [ 'ABABA', 'A', 'xx', 2, 'xxBxxBA', ], 'replace N instances of search string (right-to-left)', [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', -1, 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA-@XCK', '1 right-most replacement', ], [ 'ABABA', 'A', 'xx', -1, 'ABABxx', '1 right-most replacement', ], [ 'ABC ABC CBA ACB CAB BAC CAB CAB CARAB CACBA AXCK', ' A', '-@', -2, 'ABC ABC CBA-@CB CAB BAC CAB CAB CARAB CACBA-@XCK', '2 right-most replacements', ], [ 'ABABA', 'A', 'xx', -2, 'ABxxBxx', '2 right-most replacements', ], ); # end @Tests VECTOR: for my $ar_vector (@Tests) { if (not ref $ar_vector) { note $ar_vector; next VECTOR; } my ($str, $srch, $repl, $n, $expect, $cmnt) = @$ar_vector; my $n_cmnt = $n // 'max'; my $comment = "'$srch' -> '$repl' $n_cmnt times"; $comment .= " ($cmnt)" if defined $cmnt; is Replace($str, $srch, $repl, $n), $expect, $comment; } # end for VECTOR done_testing; exit; # subroutines ################################################ sub Replace { # dd '===', \@_, '==='; # for debug my ($string, # string to search/replace $srch, # search string (no regex metacharacters) $repl, # optional: replacement string (default: '') $n, # optional: number of replacements (default: infinit +e) ) = @_; # handle degenerate cases. return '' unless defined $string; return $string unless defined $srch; # default replacement string: empty string. $repl //= ''; # replace left-to-right if n is not defined or >= 0. my $replace_left_to_right = ! defined($n) || $n >= 0; # make replacement count positive unless undefined/infinite. $n = defined $n ? abs($n) : -1; # escape all regex metacharacters in search substring. $srch = quotemeta $srch; # dd '===', [ $string, $srch, $repl, $n, ], '==='; # for debug # regex pattern matches if a search string is not present. my $rx_not_srch = qr{ (?! $srch) . }xms; my $rx_srch = $replace_left_to_right # replace search string n times from left. ? qr{ $rx_not_srch* \K $srch }xms # replace search string n times from right. : qr{ (?: $rx_not_srch* $srch)* $rx_not_srch* \K $srch }xms ; # dd '===', $rx_srch, $n, '==='; # for debug # n > 0: replace if replacement possible. 1 while $n-- && $string =~ s{ \G $rx_srch }{$repl}xms; return $string; }
Output:
c:\@Work\Perl\monks\harangzsolt33>perl replace_1.pl # replace ALL instances of search string (left-to-right) ok 1 - ' A' -> '-@' max times (unlimited replacements) ok 2 - 'A' -> 'xx' max times (unlimited replacements) ok 3 - ' A' -> '-@' 3 times ok 4 - 'A' -> 'xx' 3 times (unlimited replacements) ok 5 - ' A' -> '-@' 4 times ok 6 - 'A' -> 'xx' 4 times (unlimited replacements) ok 7 - ' A' -> '-@' 99 times ok 8 - 'A' -> 'xx' 99 times (unlimited replacements) # replace ALL instances of search string (right-to-left) ok 9 - ' A' -> '-@' -3 times ok 10 - 'A' -> 'xx' -3 times (unlimited replacements) ok 11 - ' A' -> '-@' -4 times ok 12 - 'A' -> 'xx' -4 times (unlimited replacements) ok 13 - ' A' -> '-@' -99 times ok 14 - 'A' -> 'xx' -99 times (unlimited replacements) # replace N instances of search string (left-to-right) ok 15 - ' A' -> '-@' 0 times ok 16 - 'A' -> 'xx' 0 times ok 17 - ' A' -> '-@' 1 times ok 18 - 'A' -> 'xx' 1 times ok 19 - ' A' -> '-@' 2 times ok 20 - 'A' -> 'xx' 2 times # replace N instances of search string (right-to-left) ok 21 - ' A' -> '-@' -1 times (1 right-most replacement) ok 22 - 'A' -> 'xx' -1 times (1 right-most replacement) ok 23 - ' A' -> '-@' -2 times (2 right-most replacements) ok 24 - 'A' -> 'xx' -2 times (2 right-most replacements) 1..24 ok 25 - no warnings 1..25

Update: It finally dawned on me that left/right search could be a lot simpler. This

my $rx_srch = $replace_left_to_right ? qr{ .*? \K $srch }xms # search for leftmost match : qr{ .* \K $srch }xms # search for rightmost match ;
works with all the test cases posted above and a few more I've added since posting. (And  $rx_not_srch is no longer needed.)


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: 1000000th question about regex lol (updated)
by harangzsolt33 (Chaplain) on Dec 07, 2019 at 15:39 UTC
    Yes, I am sorry. Here is the explanation: I wanted to write a sub that does not always replace every occurrence of a substring. The caller can specify to only replace the first two matches or the last three matches or anything. But when every single match has to be replaced, I realized that I can use a regex to do that. But the regex replace doesn't get to run unless you omit the fourth argument.
      The caller can specify to only replace the first two matches or the last three matches or anything.

      By "or anything", do you mean that it should be possible to replace a substring that's somewhere in the middle, but not anchored at either end? My understanding of the code and discussion so far is that any sequence of substring replacements must be anchored to an end of the string, but "or anything" makes me wonder. If you have a counterexample, please post it as a test case.


      Give a man a fish:  <%-{-{-{-<

        Oh, sorry. By "anything" I meant any number.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11109801]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2024-03-28 12:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found