Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Count multiple pattern matches

by Zaxo (Archbishop)
on Dec 07, 2004 at 02:04 UTC ( #412806=note: print w/replies, xml ) Need Help??


in reply to Count multiple pattern matches

One way is almost what you have, but with some changes,

use strict; my (@keywords, %keyword)=qw/foo bar 12345 abcd/; my ($string, %result) = "foobarfoo1234523423412345abcdefsadfabc";
Precompile the regexen, @keyword{@keywords} =  map {qr/\Q$_\E/} @keywords;
Now get the count directly without any named temporary,
$result{$_} = () = $string =~ $keyword{$_} for (keys %keyword);
That's no big change over what you have, but uses some idiomatic optimizations.

Another way is to count within the big regex you mention. You can do that with a code construction in the re,

my @regexen = map { qr/(?:\Q$_\E(?{$result{$_}++}))/ } @keywords; my $re = do { local $" = '|'; qr/@regexen/; };
I've used my favorite tricky way of getting alternation into an array there (qr// is an interpolating quote operator).
$string =~ /$re/g; print "$_: $result{$_}\n" for @keywords;
There, the regex engine should only evaluate the code part if the text has matched, and then restart the regex at pos for the next match. Untested,

Update: Perl qr// dosn't seem to like running the (?{$result{$_}++}) bit. I'm not sure why. Anybody know?

A third way is to munch through the string with index for each word you want to match.

It may be worthwhile to study your text before running the regex matches on it. Benchmark your different approaches, chances are that each will be best for some cases.

After Compline,
Zaxo

Replies are listed 'Best First'.
Re^2: Count multiple pattern matches
by zejames (Hermit) on Dec 07, 2004 at 10:18 UTC

    I've had a look at your pretty solution (with qr). However, it currently does not work, for several reasons.

    The first in the one you mention : it does not compile, because of qr and (?{...}) block. That is explained in perlre, about the (?{...}) block :

    For reasons of security, this construct is forbidden if the regular expression involves run-time interpolation of variables, unless the perilous use re 'eval' pragma has been used (see re), or the variables contain results of qr// operator (see perlop/"qr/STRING/imosx").

    In short, that is to prevent a external variable containing such a block to be executed in the regex engine. The solution is described in the doc : use re 'eval'

    However, some other thing does not work yet : when the regex qr/(?:(\Q$_\E))(?{$result{$_}++})/ is matched, that does not set the %result hash entry, because at execution time, $_ is not what you expect. If you look at the regex created :

    print $re; __DATA__ (?-xism:(?-xism:(?:(foo))(?{$result{$_}++}))|(?-xism:(?:(bar))(?{$resu +lt{$_}++}) )|(?-xism:(?:(12345))(?{$result{$_}++}))|(?-xism:(?:(abcd))(?{$result{ +$_}++})))

    you'll see that $_ variable is not replaced by the keyword values. To my understanding, it is because (?{...}) contains code that will be interpreted later, so perl does not interpret in at that time, leaving it unmodified.

    So, to do what we want here, I use (...) to catch a execution time the match and set the correct hash entry. There is a final problem : the name of the variable.

    my $data = "foofdsfdsbar"; my @matches = $data =~ m/(foo)(?{print "($1,$2,$+)"})|(bar)(?{print "($1,$2,$+)" +})/g; __DATA__ (foo,,foo)(,bar,bar)

    That is to say : $+ is the only one that surely contains the last pattern matched (BTW, I know about performance penalty when using it, but I won't care for now ;)

    So, let's code it !

    #!/usr/bin/perl use strict; use re 'eval'; my @keywords = qw/foo bar 12345 abcd/; my ($string) = "foobarfoo1234523423412345abcdefsadfabc"; our %result; my @regexen = map { qr/(?:(\Q$_\E))(?{$result{$+}++})/ } @keywords; my $re = do { local $" = '|'; qr/@regexen/; }; my @match = $string =~ /$re/g; print "$_: $result{$_}\n" for @keywords;

    --
    zejames

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://412806]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2019-10-16 08:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?