http://www.perlmonks.org?node_id=749466

learningperl01 has asked for the wisdom of the Perl Monks concerning the following question:

Hello I am hoping someone can point me in the right direction. I have the following code shown below. What I am trying to do is print the count/total number of matches seen in the regex along with the name of the match that were found.

Currently the script prints the number of matches per regex string instead of the total number. I would like to see the results as shown under "wanted results" shown below. Thanks for the help in advance!!
#!/usr/bin/perl use strict; use warnings; while (<DATA>) { my $string_found = $_; my @match = ( $string_found =~ /(first_string|second_string|third_stri +ng|fourth_string|fifth_string)/gm ); print scalar @match," matches found: ",join ", ",@match, "\n"; } __DATA__ This is a test file matchme ljldjlfjd l;djfldjlf d test test test dljfldjlfjldjfldjlljdf one second_string dlfjldfj ljdfldjjf ldjfljdl dfljdlfj dfdlfj three ljfldjlj dlfjlasdj foiidufoiida matchdf dljfldsaofuoidfousdaof ladsjflasdof first_string dlfjodsuofuasdo sadoufosadu foasduf aosduf third_string __Current results__ 0 matches found: 1 matches found: second_string, 0 matches found: 0 matches found: 1 matches found: first_string, 1 matches found: third_string, __Wanted results__ 3 matches found: second_string, first_string, third_string

Replies are listed 'Best First'.
Re: Printing the count for regex matches
by bellaire (Hermit) on Mar 10, 2009 at 01:26 UTC
    In this case, you want all of the DATA in one go rather than in six separate lines. To achieve this, it'd be easiest to simply localize the input record separator $/, see perlvar for details. Basically, asking for <DATA> in a while loop grabs one record at a time, in this case one line at a time. That's because the input record separator is newline by default. If you do a local $/, you're effectively removing record separations, and the whole DATA segment will be read in at one time.
    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { my $string_found = $_; my @match = ( $string_found =~ /(first_string|second_string|third_stri +ng|fourth_string|fifth_string)/gm ); print scalar @match," matches found: ",join ", ",@match, "\n"; } __DATA__ This is a test file matchme ljldjlfjd l;djfldjlf d test test test dljfldjlfjldjfldjlljdf one second_string dlfjldfj ljdfldjjf ldjfljdl dfljdlfj dfdlfj three ljfldjlj dlfjlasdj foiidufoiida matchdf dljfldsaofuoidfousdaof ladsjflasdof first_string dlfjodsuofuasdo sadoufosadu foasduf aosduf third_string
    Produces this result:
    3 matches found: second_string, first_string, third_string,
Re: Printing the count for regex matches
by Lawliet (Curate) on Mar 10, 2009 at 01:28 UTC

    Declare @match before the loop (so that it is not overwritten each time), append any matches to @match, and move the print statement outside the while loop.

    And you didn't even know bears could type.

Re: Printing the count for regex matches
by jethro (Monsignor) on Mar 10, 2009 at 01:32 UTC
    my @match; while (<DATA>) { my $string_found = $_; push @match,( $string_found =~ /(first_string|second_string|third_st +ring|fourth_string|fifth_string)/gm ); } print scalar @match," matches found: ",join ", ",@match, "\n";

    Note that the var $string_found and the braces around the regex are superfluous

Re: Printing the count for regex matches
by hbm (Hermit) on Mar 10, 2009 at 01:41 UTC

    How's this? I used (?:) to shorten the regex; and parens in the join to get rid of the dangling comma.

    use strict; use warnings; my @matches; while (<DATA>) { push @matches, $1 if /((?:first|second|third|fourth|fifth)_string)/ +; } print scalar @matches, " matches found: ", join(", ", @matches), "\n";
      Nice try, but what will happen if you have more than one string matching in one line? You will only catch the first one and miss all subsequent ones.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Printing the count for regex matches
by Marshall (Canon) on Mar 10, 2009 at 09:20 UTC
    First, I congratulate you! This idea of using match global in a list context is a great one!

    I wasn't exactly sure what you wanted as there is some ambiguity in the spec. I interpreted the intent to be: number of the strings that matched anywhere (not total number of matches of those strings) and which ones were they? Code below can be tweaked to do different things.

    A few points that you may find interesting is that
    1.To read the entire file into a single Perl variable, no while{} loop is required, just undef the record separator and do a scalar assignment! BTW, you will often see @var=(),$var=() and this has same effect as =undef.
    2. It is possible to increment a hash value that doesn't even exist yet! Perl will create it and set value =1 on the first increment.
    3. You will see that I "cheated" in the print. The concatenation op like: "".keys(%seen) forces keys into a scalar context, scalar keys(%seen) would have been fine too, but this is just a fine point that you may find useful later, eg, print "blah".@array is different than print "blah", @array.
    4. I took the /m out of the regex, you may need this if a string spans across \n boundaries.
    5. You can expand this code to say print a table of number of times each string matched, etc.

    #!/usr/bin/perl -w use strict; my %seen; $/=undef; #input record separator doesn't matter any more my $data = <DATA>; #whole file is read into one variable! my @match = ( $data =~ /(first_string|second_string|third_string|fourt +h_string|fifth_string)/g ); foreach (@match) { $seen{$_}++; #yes, you can increment a hash key that #doesn't exit yet! } #now we have a hash of strings that were seen. #also know the number times each string was seen. #but that doesn't appear to be necessary to know that here? print "".keys(%seen)," matches found: ", join(", ",sort keys(%seen))," +\n"; __DATA__ This is a test file matchme ljldjlfjd l;djfldjlf d test test test dljfldjlfjldjfldjlljdf one second_string dlfjldfj ljdfldjjf ldjfljdl dfljdlfj dfdlfj three ljfldjlj dlfjlasdj foiidufoiida matchdf dljfldsaofuoidfousdaof ladsjflasdof first_string dlfjodsuofuasdo sadoufosadu foasduf aosduf third_string __END__ __above prints: 3 matches found: first_string, second_string, third_string
Re: Printing the count for regex matches
by johngg (Canon) on Mar 10, 2009 at 09:33 UTC

    Since you don't seem to be using the captures there's no point in remembering them so just increment the count each time.

    use strict; use warnings; my $matchCount = 0; while( <DATA> ) { $matchCount += () = m{( first_string | second_string | third_string | fourth_string | fifth_string)}xg; } print qq{$matchCount matches found\n}; __DATA__ This is a test file matchme ljldjlfjd l;djfldjlf d test test test dljfldjlfjldjfldjlljdf one second_string dlfjldfj ljdfldjjf ldjfljdl dfljdlfj dfdlfj three ljfldjlj dlfjlasdj foiidufoiida matchdf dljfldsaofuoidfousdaof ladsjflasdof first_string dlfjodsuofuasdo sadoufosadu foasduf aosduf third_string

    Produces.

    3 matches found

    I hope this is of interest.

    Cheers,

    JohnGG

    Update: Oops, misread the OP, you are using the captures so ignore this.

    Update 2: Code amended to keep the captures.

    use strict; use warnings; my @matches = (); my $matchCount = 0; while( <DATA> ) { $matchCount += my( @caps ) = m{( first_string | second_string | third_string | fourth_string | fifth_string)}xg; push @matches, @caps; } print qq{$matchCount matches found: @matches\n}; __DATA__ This is a test file matchme ljldjlfjd l;djfldjlf d test test test dljfldjlfjldjfldjlljdf one second_string dlfjldfj ljdfldjjf ldjfljdl dfljdlfj dfdlfj three ljfldjlj dlfjlasdj foiidufoiida matchdf dljfldsaofuoidfousdaof ladsjflasdof first_string dlfjodsuofuasdo sadoufosadu foasduf aosduf third_string
    3 matches found: second_string first_string third_string