Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

pattern matching a limited number of times

by bob49 (Initiate)
on Sep 14, 2002 at 03:04 UTC ( #197779=perlquestion: print w/ replies, xml ) Need Help??
bob49 has asked for the wisdom of the Perl Monks concerning the following question:

OK, I'm a perl newbie, and I'm glad you folks are around.

I'd have thought this question had an easy answer, but my research just revealed nothing. Here's the issue:

Take a substitution command as follows:

$abcd =~ s/ab/cd/i;

As above, it'll match the first "ab" and change it to "cd" Easy. ok, to make it work on every instance of "ab" in the doc, change it to this by adding the "g":

$abc =~ s/ab/cd/gi;

But what if I want it to match UP TO the first 5 instances when I'm not sure how many instances are in the doc? Lets say there may be 1 or 4 or 7 or twenty instances, but in all cases I just want up to the first 5?

I've seen the following notation:

* - Match zero or more times
+ - Match one or more times
? - Match zero or one time
{X} - Match X times EXACTLY.
{X,} - Match X or more times
{X,Y} - Match X to Y times

But for example, the "match X to Y times" won't work for me if the document contains more than Y instances, and I only want the first 5. It'll only MATCH if the required ranges exist--apparently has nothing to do with the number of substitutions actually made, which will be zero if the instances don't fall within the range...

OK, once more, lest I've confused you. I want the substitution to occur only on the first 5 instances, or fewer, if fewer than 5 exist.

Surely there's an easy way to do this?

Comment on pattern matching a limited number of times
Re: pattern matching a limited number of times
by PodMaster (Abbot) on Sep 14, 2002 at 03:48 UTC
    I try to keep code out of regexes, so i won't show you that example.
    $foo =~ s/ab/cd/i for 1..5;

    ____________________________________________________
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      $foo =~ s/ab/cd/i for 1..5;

      The above seems to work very well. I'm not sure I see the advantages to the more complex phrasings below. Please correct me if I'm wrong.

      Thanks Podmaster.
        The main problem is that it doesn't deal with overlapping regexes very well, (i.e. your regex matches parts of your replacement string).

        Suppose you have the string:

        "The wodchuck at the zo stod on the stoop";
        You realize that it will only make sense after replacing the first three 'o's with 'oo's
        my $str = "The wodchuck at the zo stod on the stoop"; $str =~ s/o/oo/i for 1..3; print "$str\n";
        Oops, that yields:
        The woooodchuck at the zo stod on the stoop
        The other solutions would have produced:
        The woodchuck at the zoo stood on the stoop

        -Blake

duplicate, delete
by NodeReaper (Curate) on Sep 14, 2002 at 03:50 UTC

    Reason: podmaster DELETE, DUPLICATE (damn)

    For more information on this node visit: this

Re (using assertions): pattern matching a limited number of times
by bart (Canon) on Sep 14, 2002 at 09:13 UTC
    I guess this would be a good place for assertions. An assertion is a piece of Perl code embedded in the regex, which says that a match may or may not succeed.

    Now the current (perl5) syntax for a regex assertion is very awkward. This will do it:

    /(?(?{NOTCOND})(?!))/
    NOTCOND is perl code producing a true when you want the assertion to fail, because only in that case an attempt to match the regex snippet /(?!)/ is made, and this is a lookahead that never matches.

    So, you can try this:

    my $i = 0; s/ab(?(?{++$i>5})(?!))/cd/g;

    This will indeed stop matching after the fifth match, producing the desired result; but it will not stop trying: it will go through the whole string, and if your string contains 50 substrings "ab", then the assertion code will be called 50 times. So for long strings, it won't be too efficient.

      How about a modified version of japhy's code.
      #!/usr/bin/perl -wT use strict; my $string = 'abc ' x 10; my $search = 'ab'; my $replace = 'CD'; print "$string\n"; my $i = 0; while ($string =~ /$search/gi) { substr($string, $-[0], $+[0]-$-[0], $replace); last if ++$i == 4; pos($string) = $-[0] + 1; } print "$string\n"; __END__ abc abc abc abc abc abc abc abc abc abc CDc CDc CDc CDc abc abc abc abc abc abc
      Update: Doh! I didn't see your very similar solution before I posted....

      -Blake

        Doh! I didn't see your very similar solution before I posted....
        It is very similar, isn't it? And I hadn't seen yaphy's solution before I posted mine. Now, it left me wondering: is setting pos() necessary, or will Perl DWIM, and continue matching at the same point in the (original) string without help? That's what s///g does.

        Well, it turns out that it appears to be using an offset in the string to keep track of where it was. So it is necessary.

        $_ = 'abxy' x 12; for(my $i = 0; m/ab/g and $i++<5;) { substr($_, $-[0], $+[0]-$-[0]) = 'abXab'; } print;
        Result: abXabXabXabXabXabxyabxyabxyabxyabxyabxyabxyabxyabxyabxyabxyabxy

        Not good.

        $_ = 'abxy' x 12; for(my $i = 0; m/ab/g and $i++<5;) { substr($_, $-[0], $+[0]-$-[0]) = 'abXab'; pos = $-[0]+5; } print;
        Result: abXabxyabXabxyabXabxyabXabxyabXabxyabxyabxyabxyabxyabxyabxyabxy

        Good.

        p.s. I noticed this, which is also very awkward:

        my $length = length(substr($_, 2, 2) = 'abXab'); print $length;
        prints 2, not 5. What happened to the rule: the value of an assignment as an expression, is what you assign?
Re (m//g in a loop): pattern matching a limited number of times
by bart (Canon) on Sep 14, 2002 at 09:19 UTC
    Another, entirely different approach than using assertions, is to try an match your pattern using m//g in a while/for loop. It appears to work rather well. Note that I've changed the substitution string so that it's a different length from what it matched, just to make sure it does work.
    $_ = 'abxy' x 15; for(my $i = 0; m/ab/g and $i++<5;) { substr($_, $-[0], $+[0]-$-[0]) = 'ABC'; } print;
    Result: ABCxyABCxyABCxyABCxyABCxyabxyabxyabxyabxyabxyabxyabxyabxyabxyabxy

    That looks about right to me.

    Update: See my follow-up in another subthread, that in general, you need to set pos() whenever you change the original string before continueing searching.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://197779]
Approved by Courage
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (8)
As of 2014-07-29 07:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls