Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Limiting number of regex matches

by eversuhoshin (Sexton)
on Sep 25, 2012 at 19:12 UTC ( #995621=perlquestion: print w/replies, xml ) Need Help??

eversuhoshin has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I was wondering if there was a way to limit the number of regex matches

my $str='dog dog dog dog dog'; #my $count4=()=$str=~m/dog/g; my $count5=()=$str=~m/dog/ for{1,3}; print "$count4 \n"; print "$count5 \n";

From the code above, is there a way to match dog only three times instead of counting the all occurrences of "dog." I have read somewhere that "for" may work I am not sure. Do I have to create a for loop? Is there no other way? thank you so much!

Replies are listed 'Best First'.
Re: Limiting number of regex matches
by davido (Cardinal) on Sep 25, 2012 at 21:17 UTC

    $count++ while $count < $limit && $str =~ m/\bdog\b/g;


      I do like Dave's answer. But there is one thing that he didn't tell us which is that $count has to be initialized before calling this statement when using strict and warnings. $limit could have been a constant and I think that is the same.
      #!/usr/bin/perl -w use strict; my $str='dog dog horse dog cow dog pig dog'; my $count =0; # won't work unless $count intitialized to 0 # simple declaration of "my $count; or a "my" # variable within in this type of complex perl # statement doesn't work under strict and warnings. # Or at least I've not been able get this # kind of stuff to work before. # But given my caveats, this does work! # whether is is better or not is left to # the readers.. my $limit =3; $count++ while $count < $limit && $str =~ m/dog/g; print "$count dogs were counted\n"; my $str2 = "horse dog cow"; $count = 0; #needed for initialization $count++ while $count < $limit && $str2 =~ m/dog/g; print "$count dogs were counted\n"; __END__ 3 dogs were counted 1 dogs were counted
      Now one good thing about Dave's code is that it will stop after a certain number of matches - or at least I think that is what is going to happen (the match global gizmo can be modulated and monitored as it progresses). Whether or not that really matters depends upon the length of the string and the number of "dogs". I would claim that with less than 20 animals, it doesn't matter at all. Once the string gets much bigger than than that, well, it could matter. Software is art with science as a base, but there are exceptions to every "rule". What is the best in this application is just not known. That's why I show some examples of how to use Dave's code. Its a good idea and worthy of consideration, especially if the data set is very large.

        To be accurate, it does work with or without initializing $count to zero. But on the first iteration, the undefined $count will be treated as though it were zero, and a warning will be generated letting the programmer know that he probably should initialize $count to zero explicitly before using it in the context of a numeric comparison (assuming warnings are enabled, as they probably ought to be).

        As for why the example doesn't explicitly use strict, first it seemed that the OP already had a handle on how to declare lexical variables, and second, "Well, because it's a four-line one-line example program I concocted as an example in my Usenet PerlMonks article---duh!"

        (I hope the intended humor isn't lost in this post, your point is valid.)

        Oh, and you're correct; the process stops as soon as the $limitth match occurs, which is a good approach since it stops extra work from happening. Think of it as the difference between List::Util's first function, and the core's grep.


Re: Limiting number of regex matches
by Marshall (Canon) on Sep 25, 2012 at 20:42 UTC
    The easiest way is not to limit the number of regex matches, but to trim the results with a list slice.
    Let match global do its thing and then return the first 3 matches.
    #!/usr/bin/perl -w use strict; my $str='dog dog horse dog cow dog pig dog'; my @dogs = ($str =~ m/dog/g)[0..2]; print "number of dogs = ".@dogs,"\n"; # the . concatenation puts @dogs in a scalar context # same as print scalar(@dogs) print "@dogs\n"; ################## # so what happens if there aren't 3 dogs? # one way is to use map to filter the undef's out # To return a "nothing at all" from map, use () # undef is a value and that won't work... # # This returns a maximum of 3 dogs if there are that # many or more dogs, otherwise it returns less. # my $second_pack = "dog horse cow"; @dogs = map{$_ or ()}($second_pack =~ /dog/g)[0..2]; print "\nsecond pack = ".@dogs, " dogs\n"; print "@dogs"; __END__ number of dogs = 3 dog dog dog second pack = 1 dogs dog
Re: Limiting number of regex matches
by Anonymous Monk on Sep 25, 2012 at 20:27 UTC
    #!/usr/bin/perl -- use strict; use warnings; use Data::Dump; my $str ='dog dog dog dog dog'; my @dogs ; while( @dogs < 3 and $str =~ /(dog)/g ){ push @dogs, $1; } dd \@dogs; __END__ ["dog", "dog", "dog"]

      Or Maybe this:

      use warnings; use strict; my $str = 'dog dog dog dog dog'; print join " " => ( split /\s+/, $str )[ 0 .. 2 ];
      dog dog dog

      If you tell me, I'll forget.
      If you show me, I'll remember.
      if you involve me, I'll understand.
      --- Author unknown to me
        That doesn't limit the number of matches, it still counts/matches every single dog in the string
Re: Limiting number of regex matches
by AnomalousMonk (Bishop) on Sep 26, 2012 at 02:33 UTC

    Perhaps another way is with a counted quantifier (see Quantifiers in perlre):

    >perl -wMstrict -le "my $mail = 'Spam mail sPam spAm stealthspam mail spaM'; print qq{you have the following mail: '$mail'}; ;; my $spam = qr{ (?i) \b spam \b }xms; for my $n (2 .. 5) { if (my ($spams) = $mail =~ m{ ((?: .*? $spam){$n}) }xms) { print qq{there are $n spam emails in mail: '$spams'}; } else { print qq{there are NOT $n spam emails in mail}; } } " you have the following mail: 'Spam mail sPam spAm stealthspam mail spa +M' there are 2 spam emails in mail: 'Spam mail sPam' there are 3 spam emails in mail: 'Spam mail sPam spAm' there are 4 spam emails in mail: 'Spam mail sPam spAm stealthspam mail + spaM' there are NOT 5 spam emails in mail

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://995621]
Approved by philipbailey
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2022-09-29 04:23 GMT
Find Nodes?
    Voting Booth?
    I prefer my indexes to start at:

    Results (125 votes). Check out past polls.