http://www.perlmonks.org?node_id=1016780

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

something like this

{if ($array[x] =~ /$substring/) {$count++;}}

counts the number of strings in the array that have the substring

and something like this

my $count2=0; while ($array[1] =~ /$substring/g) {$count2++;}

counts the number of substrings in a single string
but how can i combine them?

thank you

Replies are listed 'Best First'.
Re: How to count substrings in an array?
by mbethke (Hermit) on Feb 03, 2013 at 07:13 UTC

    The "goatse operator" can make this a bit shorter and faster: $count =()= $string =~ /RE/; counts the number of matches in $string, so

    my $count; for(@array) { $count +=()= /RE/g; }
    does what you want. It evaluates the regex in list context so it returns the list of matches and then forces it to scalar context without assigning the list anywhere.

      This works where successive matches do not overlap. But consider the case where the string to match is 'ababa' and the substring is 'aba'. The substring occurs twice in the string, but /$t/g finds only one match, because after the first match the regex engine picks up the search at the point immediately after the last match:

      ababa ^ the regex engine starts looking for the second match here

      To get overlapping matches, use a positive lookahead:

      17:58 >perl -Mstrict -wE "my $s = 'ababa'; my $t = 'aba'; my $c = () = + $s =~ /(?=$t)/g; say $c;" 2 18:02 >

      This works, because

      1. a positive lookahead is a zero-width assertion, so the first match does not consume any of the string $s; but
      2. the regex engine advances to (at least) the next character in the string following a successful match.

      See Re: Regex: finding all possible substrings by AnomalousMonk.

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Goatse operator, huh? Any idea why it isn't mentioned in perlop?

      Also, is there a way to achieve what this operator does (forced array context) without an assigment? I.e. to use inside a map { ... } without introducing temporary variables.

        > Goatse operator, huh? Any idea why it isn't mentioned in perlop?

        Maybe because the political correct name is now "Rolex operator" ? ;-)

        > I.e. to use inside a map { ... } without introducing temporary variables.

        DB<107> $s="a"x10 => "aaaaaaaaaa" DB<108> scalar ( ()= $s =~ /a/g ) => 10

        Cheers Rolf

Re: How to count substrings in an array?
by kcott (Archbishop) on Feb 03, 2013 at 07:37 UTC

    You haven't really explained what you're ultimately trying to achieve here. If you put your question in terms of a concrete context, you'll probably get a better answer. Guidelines for doing this can be found here: How do I post a question effectively?

    The following piece of code may do what you want or at least give you some idea of how to proceed.

    $ perl -Mstrict -Mwarnings -E ' my @ary = qw{xyz xyx abc xxx def zyx}; my $sub = q{x}; my @counts = map { scalar @{[/$sub/g]} } grep { /$sub/ } @ary; say "All strings = ", scalar @ary; say "Strings with $sub in them = ", scalar @counts; say "Counts of $sub in strings containing $sub:"; say for @counts; ' All strings = 6 Strings with x in them = 4 Counts of x in strings containing x: 1 2 3 1

    -- Ken

Re: How to count substrings in an array?
by BrowserUk (Patriarch) on Feb 03, 2013 at 08:16 UTC

    my $count = 0; $count += map /$substring/g, @array;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      ... and you don't even need the '+' on the  += operator:

      >perl -wMstrict -lE "my $s = 'abababa'; my $t = 'aba'; ;; my $c = map /(?=$t)/g, qw(abababa ababa); say $c; " 5
        and you don't even need the '+' on the += operator:

        Of course I don't. Much nicer!


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: How to count substrings in an array?
by mbethke (Hermit) on Feb 03, 2013 at 07:11 UTC

    Edit: forum glitch? As is evident from the title this should have ended up here

    The "goatse operator" can make this a bit shorter and faster: $count =()= $string =~ /RE/; counts the number of matches in $string, so

    my $count; for(@array) { $count +=()= /RE/g; }
    does what you want. It evaluates the regex in list context so it returns the list of matches and then forces it to scalar context without assigning the list anywhere.

      sorry, I figured it out and meant to delete it, rather than leave a blank. thank you for your time anyway.
Re: How to count substrings in an array?
by sundialsvc4 (Abbot) on Feb 04, 2013 at 15:51 UTC

    Well, “goatse” is Cute Golf, but I would veto its use in a code review because it simply isn’t instantaneously obvious, not only that the code does work correctly, but that it does so in every case.

    The ideal, of course, would be that the code is supported by Test::More test suites that thoroughly exercise all relevant cases to prove not only that it works properly in every case that might be thrown at it, but that it also rejects any invalid string that violates one of its design assumptions.   (That’s a lot more work for the programmer, so I usually encounter “nasty bugs in production” instead.   So do you.   So it goes.)

    What if the substring contains instances of itself, or tails with its own head?
    For instance, how many occurrences of aba do you want to say occurs in the string abababa?   Two, or three?   You have two design choices here, and you must know what is the right answer for the purposes of this application in a production setting.